Files
flights_web/deployment

pve-201 Deployment Runbook

This is the bootstrap procedure for hosting https://ui-dashboard.gnerim.ru/ on pve-201, plus rehearsal recipes for the CI/CD pipeline failure paths. The full design rationale lives in docs/superpowers/specs/2026-04-25-cicd-pipeline-design.md.

One-time setup

1. Routing pve-201 → TIM API (via webzavod)

On webzavod (192.168.88.58) — verify IP forwarding and MASQUERADE:

sysctl net.ipv4.ip_forward                          # expect: 1
sudo iptables -t nat -L POSTROUTING -nv | grep ppp0 # expect: MASQUERADE rule

If missing:

echo 'net.ipv4.ip_forward=1' | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
sudo iptables -t nat -A POSTROUTING -o ppp0 -j MASQUERADE
sudo apt install iptables-persistent
sudo netfilter-persistent save

On pve-201 — add a persistent static route to TIM via webzavod:

# /etc/netplan/01-routes.yaml — adjust NIC name as needed
network:
  version: 2
  ethernets:
    <nic-name>:                    # replace with actual NIC name from `ip link show`
      routes:
        - to: 172.18.0.0/16
          via: 192.168.88.58
sudo netplan apply

On pve-201 — pin TIM hostnames to reachable A records (TIM DNS returns duplicate As, one of which is dead):

echo '172.18.0.121 flights.test.aeroflot.ru' | sudo tee -a /etc/hosts

Smoke test:

curl -v https://flights.test.aeroflot.ru/swagger/   # expect: 401 in <300ms

If this fails, fix routing/DNS before proceeding — nothing else will work.

2. nginx vhost

cd /path/to/Aeroflot.Flights.Web    # repo root, e.g. ~/repos/Aeroflot.Flights.Web
sudo cp deployment/nginx/ui-dashboard.gnerim.ru.conf /etc/nginx/sites-available/
sudo ln -s /etc/nginx/sites-available/ui-dashboard.gnerim.ru.conf /etc/nginx/sites-enabled/
sudo mkdir -p /etc/nginx/htpasswd
sudo nginx -t
sudo systemctl reload nginx

The htpasswd file is created by scripts/ci/install-htpasswd.sh on first deploy.

3. Gitea runner setup

The runner must be in the docker group (so it can talk to the Docker socket without sudo) and reach all upstream services:

sudo usermod -aG docker <runner-user>     # then re-login the runner service
docker ps                                  # must work without sudo for the runner user

Reachability checks the runner must pass:

curl -fsS https://git.gnerim.ru/                                        # Gitea
curl -fsSI https://teamscore.gitlab.yandexcloud.net/                    # GitLab
curl -fsSI http://jenkins.yc.devwebzavod.ru:8080/                       # Jenkins (via static route)
curl -fsSI http://flights-ui.devwebzavod.ru/                            # Customer URL (via static route)

4. GitLab Personal Access Token

GitLab → User Settings → Access Tokens → create with scopes api and write_repository. Store as Gitea Actions secret GITLAB_PAT.

5. Allow self-approve on GitLab project

GitLab → flights-front project → Settings → Merge requests → Approval rules → uncheck "Prevent approval by author".

Verify by running (locally, after PAT is in place — script is created in Task 17 of the plan):

GITLAB_PAT=<pat> ./scripts/ci/check-gitlab-project.sh

It prints the numeric project ID (store as GITLAB_PROJECT_ID secret) and confirms self-approve is allowed.

6. Jenkins remote trigger token

Jenkins → Aeroflot2/Flights-Front-Dev job → Configure → check "Trigger builds remotely" → set token (e.g. flights-cd-trigger). Store as JENKINS_TRIGGER_TOKEN.

Also: Jenkins → User → Configure → API Token → Add new token. Store username as JENKINS_USER, token as JENKINS_API_TOKEN.

7. Telegram bot

Use existing bot or create via @BotFather. Get the chat_id by sending a message and querying https://api.telegram.org/bot<TOKEN>/getUpdates. Store as TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID.

8. Gitea Actions secrets summary

Repo → Settings → Actions → Secrets — set all of:

Secret Purpose
BASIC_AUTH_USER, BASIC_AUTH_PASS nginx htpasswd
MAP_TILE_URL Default /map/api/tile/{z}/{x}/{y}.jpeg
API_BASE_URL Default /api
GITLAB_PAT, GITLAB_PROJECT_ID GitLab MR API
JENKINS_USER, JENKINS_API_TOKEN, JENKINS_TRIGGER_TOKEN Jenkins API
TELEGRAM_BOT_TOKEN, TELEGRAM_CHAT_ID Notifications
GITHUB_TOKEN Auto-provided by Gitea Actions — no manual setup required

Verifying failure paths

Run at least the rollback and "release blocked" rehearsals once before declaring the pipeline production-grade.

A: e2e fail → rollback

Push a commit that adds console.error('rehearsal') somewhere that runs on every page (e.g. src/routes/layout.tsx). Workflow A runs, e2e fails on the console-gate, rollback to :previous triggers. Verify:

  • Telegram message: ❌ ci-deploy FAILED at step "Run Playwright e2e" — rolled back to <prev-sha>
  • https://ui-dashboard.gnerim.ru/ still serves the previous version (check the page or docker inspect flights-web).

Revert the rehearsal commit when done.

A: rollback itself fails

ssh pve-201 'docker rmi flights-web:previous'

Then push a commit that fails e2e. Rollback step finds no :previous and bails. Verify:

  • Telegram message: 🔥 ci-deploy ROLLBACK FAILED — site is DOWN
  • https://ui-dashboard.gnerim.ru/ returns 502.
  • Manual recovery: ssh pve-201 'docker stop flights-web 2>/dev/null; docker rm flights-web 2>/dev/null; docker run -d --name flights-web --restart unless-stopped -p 127.0.0.1:8081:8080 flights-web:<known-good-sha>'.

B: blocked on A not green

Trigger Workflow B (manual or tag) for a SHA that has no green Workflow A run. Verify:

  • Telegram message: ⚠️ release blocked — workflow ci-deploy is not green for <sha>
  • B exits early; nothing changes in GitLab.

B: Jenkins poll timeout

Temporarily edit scripts/ci/jenkins-trigger-and-wait.sh to change the default:

TIMEOUT="${JENKINS_TIMEOUT:-30}"   # was 1800

Push to a throwaway branch, trigger Workflow B from that branch via the Gitea UI, and confirm:

  • Telegram message: ❌ release FAILED at Jenkins build (because polling gives up after 30s)
  • The Jenkins job itself may continue running — that's fine, it's outside our control.

Restore the original 1800 default and force-delete the throwaway branch when done.

Manual recovery scenarios

Workflow B failed at step 12-13 (Jenkins) — MR merged but customer site stale

GitLab is already at the new commit; Jenkins didn't deploy. Recovery:

  1. Open Jenkins UI → click "Build Now" on the same job, or
  2. Push a new commit to GitLab to re-trigger Jenkins polling (if it's set up that way), or
  3. Re-run Workflow B from a green Workflow A — but only if you also pushed new code; otherwise B will sync a no-op and skip.

Container running but nginx returns 502

Check the bind:

ssh pve-201
docker ps --filter name=flights-web
curl -v http://127.0.0.1:8081/   # should return 200 (or whatever the SSR root returns)
sudo nginx -t && sudo systemctl reload nginx

If the container died, the Restart policy unless-stopped should bring it back. If not:

docker logs flights-web --tail 200
docker stop flights-web 2>/dev/null; docker rm flights-web 2>/dev/null
docker run -d --name flights-web --restart unless-stopped -p 127.0.0.1:8081:8080 flights-web:current