Files

pve-201 Deployment Runbook

This is the bootstrap procedure for hosting https://ui-dashboard.gnerim.ru/ on pve-201, plus rehearsal recipes for the CI/CD pipeline failure paths. The full design rationale lives in docs/superpowers/specs/2026-04-25-cicd-pipeline-design.md.

One-time setup

1. SSH tunnel pve-201 → webzavod (TIM API access)

The customer WAF on flights.test.aeroflot.ru only accepts requests from corp-VPN egress IPs. nginx proxies /api/ and /map/api/ to https://127.0.0.1:8443, which is forwarded over SSH to webzavod (which terminates the corp VPN on ppp0). A systemd unit keeps the tunnel up.

On webzavod (192.168.88.58) — append the pve-201 pubkey to ~gnezim/.ssh/authorized_keys with permitopen restricting it to one host:port (one-time, read pve-201's ~gnezim/.ssh/id_rsa.pub first):

command="exit 1",no-pty,no-X11-forwarding,no-agent-forwarding,no-user-rc,permitopen="flights.test.aeroflot.ru:443" ssh-rsa AAAA…== pve-201-flights-tim-tunnel

On pve-201 — install + enable the systemd unit:

cd /path/to/Aeroflot.Flights.Web
sudo cp deployment/systemd/flights-tim-tunnel.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now flights-tim-tunnel.service
sudo systemctl status flights-tim-tunnel.service --no-pager

Smoke test:

ss -ltn | grep ':8443\b'    # expect: a 127.0.0.1:8443 LISTEN line
curl -k --resolve flights.test.aeroflot.ru:8443:127.0.0.1 \
  -o /dev/null -w 'swagger: %{http_code}\n' \
  https://flights.test.aeroflot.ru:8443/swagger/index.html   # expect 401
curl -k --resolve flights.test.aeroflot.ru:8443:127.0.0.1 \
  -o /dev/null -w 'api/health: %{http_code}\n' \
  https://flights.test.aeroflot.ru:8443/api/health           # expect 200

If swagger returns 200 with HTML body instead of 401, the tunnel is bypassed and the request egressed directly — fix the listener / SSH unit before proceeding.

2. nginx vhost

cd /path/to/Aeroflot.Flights.Web
sudo cp deployment/nginx/ui-dashboard.gnerim.ru.conf /etc/nginx/sites-available/
sudo ln -sf /etc/nginx/sites-available/ui-dashboard.gnerim.ru.conf /etc/nginx/sites-enabled/
sudo mkdir -p /etc/nginx/htpasswd
sudo nginx -t
sudo systemctl reload nginx

The htpasswd file is created by scripts/ci/install-htpasswd.sh on first deploy.

3. Gitea runner setup

The runner must be in the docker group (so it can talk to the Docker socket without sudo) and reach all upstream services:

sudo usermod -aG docker <runner-user>     # then re-login the runner service
docker ps                                  # must work without sudo for the runner user

Reachability checks the runner must pass:

curl -fsS https://git.gnerim.ru/                                        # Gitea
curl -fsSI https://teamscore.gitlab.yandexcloud.net/                    # GitLab

The customer Jenkins URL and the customer site (flights-ui.devwebzavod.ru) are NOT reachable from the runner directly — Workflow B does not call them. Customer-side e2e (Workflow C, release-verify) only runs after the operator has manually triggered the Jenkins build, and it reaches the customer URL the same way the upstream API is reached: direct egress where possible, or through additional tunnels added on demand.

4. GitLab Personal Access Token

GitLab → User Settings → Access Tokens → create with scopes api and write_repository. Store as Gitea Actions secret GITLAB_PAT.

5. Allow self-approve on GitLab project

GitLab → flights-front project → Settings → Merge requests → Approval rules → uncheck "Prevent approval by author" (skip if you can already approve your own MRs in the GitLab UI).

Verify by running (locally, after PAT is in place):

GITLAB_PAT=<pat> ./scripts/ci/check-gitlab-project.sh

It prints the numeric project ID (store as GITLAB_PROJECT_ID secret) and confirms self-approve is allowed.

6. Telegram bot (optional)

Use existing bot or create via @BotFather. Get the chat_id by sending a message and querying https://api.telegram.org/bot<TOKEN>/getUpdates. Store as TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID.

If either secret is unset, all notify-telegram.sh calls in the workflows skip cleanly with no error — the pipeline runs end-to-end without Telegram configured.

7. Gitea Actions secrets summary

Repo → Settings → Actions → Secrets — set all of:

Secret Required Purpose
BASIC_AUTH_USER, BASIC_AUTH_PASS yes nginx htpasswd for ui-dashboard.gnerim.ru
MAP_TILE_URL optional Default /map/api/tile/{z}/{x}/{y}.jpeg
API_BASE_URL optional Default /api
GITLAB_PAT, GITLAB_PROJECT_ID yes (release only) GitLab MR API
TELEGRAM_BOT_TOKEN, TELEGRAM_CHAT_ID optional Notifications
GITHUB_TOKEN auto Provided by Gitea Actions — no manual setup required

Jenkins is triggered manually after the release workflow merges to GitLab; no Jenkins secret is required.

Verifying failure paths

Run at least the rollback and "release blocked" rehearsals once before declaring the pipeline production-grade.

A: e2e fail → rollback

Push a commit that adds console.error('rehearsal') somewhere that runs on every page (e.g. src/routes/layout.tsx). Workflow A runs, e2e fails on the console-gate, rollback to :previous triggers. Verify:

  • Telegram message: ❌ ci-deploy FAILED at step "Run Playwright e2e" — rolled back to <prev-sha>
  • https://ui-dashboard.gnerim.ru/ still serves the previous version (check the page or docker inspect flights-web).

Revert the rehearsal commit when done.

A: rollback itself fails

ssh pve-201 'docker rmi flights-web:previous'

Then push a commit that fails e2e. Rollback step finds no :previous and bails. Verify:

  • Telegram message: 🔥 ci-deploy ROLLBACK FAILED — site is DOWN
  • https://ui-dashboard.gnerim.ru/ returns 502.
  • Manual recovery: ssh pve-201 'docker stop flights-web 2>/dev/null; docker rm flights-web 2>/dev/null; docker run -d --name flights-web --restart unless-stopped -p 127.0.0.1:3002:8080 flights-web:<known-good-sha>'.

B: blocked on A not green

Trigger Workflow B (manual or tag) for a SHA that has no green Workflow A run. Verify:

  • Telegram message: ⚠️ release blocked — workflow ci-deploy is not green for <sha>
  • B exits early; nothing changes in GitLab.

Manual recovery scenarios

Workflow B succeeded but Jenkins build failed

GitLab is at the new commit; customer site is stale. Recovery:

  1. Open Jenkins UI → check the failing build's console log
  2. Fix the issue (in this repo if it's our bug, in customer's infra otherwise)
  3. Push fix → Workflow A → Workflow B → trigger Jenkins again

Container running but nginx returns 502

Check the bind:

ssh pve-201
docker ps --filter name=flights-web
curl -v http://127.0.0.1:3002/   # should return 200 (or whatever the SSR root returns)
sudo nginx -t && sudo systemctl reload nginx

If the container died, the Restart policy unless-stopped should bring it back. If not:

docker logs flights-web --tail 200
docker stop flights-web 2>/dev/null; docker rm flights-web 2>/dev/null
docker run -d --name flights-web --restart unless-stopped -p 127.0.0.1:3002:8080 flights-web:current

TIM tunnel is down (502 on /api/* but / works)

sudo systemctl status flights-tim-tunnel.service --no-pager
sudo journalctl -u flights-tim-tunnel.service -n 50 --no-pager
sudo systemctl restart flights-tim-tunnel.service
ss -ltn | grep ':8443\b'   # confirm listener is back

If the tunnel won't come up, verify SSH key is still authorised on webzavod and that webzavod's ppp0 is up (ssh webzavod 'ip -br addr show ppp0').