diff --git a/deployment/README.md b/deployment/README.md new file mode 100644 index 00000000..4abb46d4 --- /dev/null +++ b/deployment/README.md @@ -0,0 +1,188 @@ +# pve-201 Deployment Runbook + +This is the bootstrap procedure for hosting `https://ui-dashboard.gnerim.ru/` on pve-201, plus rehearsal recipes for the CI/CD pipeline failure paths. The full design rationale lives in `docs/superpowers/specs/2026-04-25-cicd-pipeline-design.md`. + +## One-time setup + +### 1. Routing pve-201 → TIM API (via webzavod) + +**On webzavod (192.168.88.58)** — verify IP forwarding and MASQUERADE: + +```bash +sysctl net.ipv4.ip_forward # expect: 1 +sudo iptables -t nat -L POSTROUTING -nv | grep ppp0 # expect: MASQUERADE rule +``` + +If missing: + +```bash +echo 'net.ipv4.ip_forward=1' | sudo tee -a /etc/sysctl.conf +sudo sysctl -p +sudo iptables -t nat -A POSTROUTING -o ppp0 -j MASQUERADE +sudo apt install iptables-persistent +sudo netfilter-persistent save +``` + +**On pve-201** — add a persistent static route to TIM via webzavod: + +```yaml +# /etc/netplan/01-routes.yaml — adjust NIC name as needed +network: + version: 2 + ethernets: + eth0: + routes: + - to: 172.18.0.0/16 + via: 192.168.88.58 +``` + +```bash +sudo netplan apply +``` + +**On pve-201** — pin TIM hostnames to reachable A records (TIM DNS returns duplicate As, one of which is dead): + +```bash +echo '172.18.0.121 flights.test.aeroflot.ru' | sudo tee -a /etc/hosts +``` + +**Smoke test:** + +```bash +curl -v https://flights.test.aeroflot.ru/swagger/ # expect: 401 in <300ms +``` + +If this fails, fix routing/DNS before proceeding — nothing else will work. + +### 2. nginx vhost + +```bash +sudo cp deployment/nginx/ui-dashboard.gnerim.ru.conf /etc/nginx/sites-available/ +sudo ln -s /etc/nginx/sites-available/ui-dashboard.gnerim.ru.conf /etc/nginx/sites-enabled/ +sudo mkdir -p /etc/nginx/htpasswd +sudo nginx -t +sudo systemctl reload nginx +``` + +The `htpasswd` file is created by `scripts/ci/install-htpasswd.sh` on first deploy. + +### 3. Gitea runner setup + +The runner must be in the `docker` group (so it can talk to the Docker socket without sudo) and reach all upstream services: + +```bash +sudo usermod -aG docker # then re-login the runner service +docker ps # must work without sudo for the runner user +``` + +Reachability checks the runner must pass: + +```bash +curl -fsS https://git.gnerim.ru/ # Gitea +curl -fsSI https://teamscore.gitlab.yandexcloud.net/ # GitLab +curl -fsSI http://jenkins.yc.devwebzavod.ru:8080/ # Jenkins (via static route) +curl -fsSI http://flights-ui.devwebzavod.ru/ # Customer URL (via static route) +``` + +### 4. GitLab Personal Access Token + +GitLab → User Settings → Access Tokens → create with scopes `api` and `write_repository`. Store as Gitea Actions secret `GITLAB_PAT`. + +### 5. Allow self-approve on GitLab project + +GitLab → flights-front project → Settings → Merge requests → Approval rules → uncheck **"Prevent approval by author"**. + +Verify by running (locally, after PAT is in place): + +```bash +GITLAB_PAT= ./scripts/ci/check-gitlab-project.sh +``` + +It prints the numeric project ID (store as `GITLAB_PROJECT_ID` secret) and confirms self-approve is allowed. + +### 6. Jenkins remote trigger token + +Jenkins → `Aeroflot2/Flights-Front-Dev` job → Configure → check **"Trigger builds remotely"** → set token (e.g. `flights-cd-trigger`). Store as `JENKINS_TRIGGER_TOKEN`. + +Also: Jenkins → User → Configure → API Token → Add new token. Store username as `JENKINS_USER`, token as `JENKINS_API_TOKEN`. + +### 7. Telegram bot + +Use existing bot or create via @BotFather. Get the chat_id by sending a message and querying `https://api.telegram.org/bot/getUpdates`. Store as `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID`. + +### 8. Gitea Actions secrets summary + +Repo → Settings → Actions → Secrets — set all of: + +| Secret | Purpose | +|---|---| +| `BASIC_AUTH_USER`, `BASIC_AUTH_PASS` | nginx htpasswd | +| `MAP_TILE_URL` | Default `/map/api/tile/{z}/{x}/{y}.jpeg` | +| `API_BASE_URL` | Default `/api` | +| `GITLAB_PAT`, `GITLAB_PROJECT_ID` | GitLab MR API | +| `JENKINS_USER`, `JENKINS_API_TOKEN`, `JENKINS_TRIGGER_TOKEN` | Jenkins API | +| `TELEGRAM_BOT_TOKEN`, `TELEGRAM_CHAT_ID` | Notifications | + +## Verifying failure paths + +Run at least the rollback and "release blocked" rehearsals once before declaring the pipeline production-grade. + +### A: e2e fail → rollback + +Push a commit that adds `console.error('rehearsal')` somewhere that runs on every page (e.g. `src/routes/layout.tsx`). Workflow A runs, e2e fails on the console-gate, rollback to `:previous` triggers. Verify: + +- Telegram message: `❌ ci-deploy FAILED at step "Run Playwright e2e" — rolled back to ` +- `https://ui-dashboard.gnerim.ru/` still serves the previous version (check the page or `docker inspect flights-web`). + +Revert the rehearsal commit when done. + +### A: rollback itself fails + +```bash +ssh pve-201 'docker rmi flights-web:previous' +``` + +Then push a commit that fails e2e. Rollback step finds no `:previous` and bails. Verify: + +- Telegram message: `🔥 ci-deploy ROLLBACK FAILED — site is DOWN` +- `https://ui-dashboard.gnerim.ru/` returns 502. +- Manual recovery: `ssh pve-201 'docker run -d --name flights-web -p 127.0.0.1:8081:8080 flights-web:'`. + +### B: blocked on A not green + +Trigger Workflow B (manual or tag) for a SHA that has no green Workflow A run. Verify: + +- Telegram message: `⚠️ release blocked — workflow ci-deploy is not green for ` +- B exits early; nothing changes in GitLab. + +### B: Jenkins poll timeout + +Set `JENKINS_TIMEOUT=30` as a secret override and trigger B. Polling should give up after 30s and report timeout. + +## Manual recovery scenarios + +### Workflow B failed at step 12-13 (Jenkins) — MR merged but customer site stale + +GitLab is already at the new commit; Jenkins didn't deploy. Recovery: + +1. Open Jenkins UI → click "Build Now" on the same job, or +2. Push a new commit to GitLab to re-trigger Jenkins polling (if it's set up that way), or +3. Re-run Workflow B from a green Workflow A — but only if you also pushed new code; otherwise B will sync a no-op and skip. + +### Container running but nginx returns 502 + +Check the bind: + +```bash +ssh pve-201 +docker ps --filter name=flights-web +curl -v http://127.0.0.1:8081/ # should return 200 (or whatever the SSR root returns) +sudo nginx -t && sudo systemctl reload nginx +``` + +If the container died, the Restart policy `unless-stopped` should bring it back. If not: + +```bash +docker logs flights-web --tail 200 +docker run -d --name flights-web -p 127.0.0.1:8081:8080 flights-web:current +```