Files
flights_web/deployment/README.md
T
gnezim 03eeddfbf8 CI/CD pipeline: ssh -L tunnel for TIM API + manual Jenkins trigger
Two design pivots discovered during Phase B prerequisites:

Routing: Replace static-route + NAT plan with persistent ssh -L tunnel
from pve-201 to webzavod (deployment/systemd/flights-tim-tunnel.service).
nginx proxies /api/ and /map/api/ to https://127.0.0.1:8443 with SNI/Host
overrides so cert validation still targets the real hostname. No webzavod
kernel changes (no ip_forward/MASQUERADE), no /etc/hosts pin needed.

Workflow B: Drop Jenkins trigger/poll automation (operator lacks Jenkins
job-configure access and user API token access). release.yml now stops
after MR merge with a Telegram message containing the Jenkins job URL.
release-verify.yml (new, workflow_dispatch only) runs the customer-URL
e2e suite once the operator has triggered Jenkins manually and it has
completed.

Other:
- SSR loopback port 8081 -> 3002 (8081 was taken by openwebui on pve-201)
- notify-telegram.sh skips cleanly when TG secrets unset (was: hard-fail)
- README + spec addendum cover the new prereqs and removed steps
2026-04-27 11:58:39 +03:00

180 lines
7.7 KiB
Markdown

# pve-201 Deployment Runbook
This is the bootstrap procedure for hosting `https://ui-dashboard.gnerim.ru/` on pve-201, plus rehearsal recipes for the CI/CD pipeline failure paths. The full design rationale lives in `docs/superpowers/specs/2026-04-25-cicd-pipeline-design.md`.
## One-time setup
### 1. SSH tunnel pve-201 → webzavod (TIM API access)
The customer WAF on `flights.test.aeroflot.ru` only accepts requests from corp-VPN egress IPs. nginx proxies `/api/` and `/map/api/` to `https://127.0.0.1:8443`, which is forwarded over SSH to webzavod (which terminates the corp VPN on `ppp0`). A systemd unit keeps the tunnel up.
**On webzavod (192.168.88.58)** — append the pve-201 pubkey to `~gnezim/.ssh/authorized_keys` with `permitopen` restricting it to one host:port (one-time, read pve-201's `~gnezim/.ssh/id_rsa.pub` first):
```
command="exit 1",no-pty,no-X11-forwarding,no-agent-forwarding,no-user-rc,permitopen="flights.test.aeroflot.ru:443" ssh-rsa AAAA…== pve-201-flights-tim-tunnel
```
**On pve-201** — install + enable the systemd unit:
```bash
cd /path/to/Aeroflot.Flights.Web
sudo cp deployment/systemd/flights-tim-tunnel.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now flights-tim-tunnel.service
sudo systemctl status flights-tim-tunnel.service --no-pager
```
**Smoke test:**
```bash
ss -ltn | grep ':8443\b' # expect: a 127.0.0.1:8443 LISTEN line
curl -k --resolve flights.test.aeroflot.ru:8443:127.0.0.1 \
-o /dev/null -w 'swagger: %{http_code}\n' \
https://flights.test.aeroflot.ru:8443/swagger/index.html # expect 401
curl -k --resolve flights.test.aeroflot.ru:8443:127.0.0.1 \
-o /dev/null -w 'api/health: %{http_code}\n' \
https://flights.test.aeroflot.ru:8443/api/health # expect 200
```
If swagger returns 200 with HTML body instead of 401, the tunnel is bypassed and the request egressed directly — fix the listener / SSH unit before proceeding.
### 2. nginx vhost
```bash
cd /path/to/Aeroflot.Flights.Web
sudo cp deployment/nginx/ui-dashboard.gnerim.ru.conf /etc/nginx/sites-available/
sudo ln -sf /etc/nginx/sites-available/ui-dashboard.gnerim.ru.conf /etc/nginx/sites-enabled/
sudo mkdir -p /etc/nginx/htpasswd
sudo nginx -t
sudo systemctl reload nginx
```
The `htpasswd` file is created by `scripts/ci/install-htpasswd.sh` on first deploy.
### 3. Gitea runner setup
The runner must be in the `docker` group (so it can talk to the Docker socket without sudo) and reach all upstream services:
```bash
sudo usermod -aG docker <runner-user> # then re-login the runner service
docker ps # must work without sudo for the runner user
```
Reachability checks the runner must pass:
```bash
curl -fsS https://git.gnerim.ru/ # Gitea
curl -fsSI https://teamscore.gitlab.yandexcloud.net/ # GitLab
```
The customer Jenkins URL and the customer site (`flights-ui.devwebzavod.ru`) are NOT reachable from the runner directly — Workflow B does not call them. Customer-side e2e (Workflow C, `release-verify`) only runs after the operator has manually triggered the Jenkins build, and it reaches the customer URL the same way the upstream API is reached: direct egress where possible, or through additional tunnels added on demand.
### 4. GitLab Personal Access Token
GitLab → User Settings → Access Tokens → create with scopes `api` and `write_repository`. Store as Gitea Actions secret `GITLAB_PAT`.
### 5. Allow self-approve on GitLab project
GitLab → flights-front project → Settings → Merge requests → Approval rules → uncheck **"Prevent approval by author"** (skip if you can already approve your own MRs in the GitLab UI).
Verify by running (locally, after PAT is in place):
```bash
GITLAB_PAT=<pat> ./scripts/ci/check-gitlab-project.sh
```
It prints the numeric project ID (store as `GITLAB_PROJECT_ID` secret) and confirms self-approve is allowed.
### 6. Telegram bot (optional)
Use existing bot or create via @BotFather. Get the chat_id by sending a message and querying `https://api.telegram.org/bot<TOKEN>/getUpdates`. Store as `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID`.
If either secret is unset, all `notify-telegram.sh` calls in the workflows skip cleanly with no error — the pipeline runs end-to-end without Telegram configured.
### 7. Gitea Actions secrets summary
Repo → Settings → Actions → Secrets — set all of:
| Secret | Required | Purpose |
|---|---|---|
| `BASIC_AUTH_USER`, `BASIC_AUTH_PASS` | yes | nginx htpasswd for `ui-dashboard.gnerim.ru` |
| `MAP_TILE_URL` | optional | Default `/map/api/tile/{z}/{x}/{y}.jpeg` |
| `API_BASE_URL` | optional | Default `/api` |
| `GITLAB_PAT`, `GITLAB_PROJECT_ID` | yes (release only) | GitLab MR API |
| `TELEGRAM_BOT_TOKEN`, `TELEGRAM_CHAT_ID` | optional | Notifications |
| `GITHUB_TOKEN` | auto | Provided by Gitea Actions — no manual setup required |
Jenkins is triggered manually after the release workflow merges to GitLab; no Jenkins secret is required.
## Verifying failure paths
Run at least the rollback and "release blocked" rehearsals once before declaring the pipeline production-grade.
### A: e2e fail → rollback
Push a commit that adds `console.error('rehearsal')` somewhere that runs on every page (e.g. `src/routes/layout.tsx`). Workflow A runs, e2e fails on the console-gate, rollback to `:previous` triggers. Verify:
- Telegram message: `❌ ci-deploy FAILED at step "Run Playwright e2e" — rolled back to <prev-sha>`
- `https://ui-dashboard.gnerim.ru/` still serves the previous version (check the page or `docker inspect flights-web`).
Revert the rehearsal commit when done.
### A: rollback itself fails
```bash
ssh pve-201 'docker rmi flights-web:previous'
```
Then push a commit that fails e2e. Rollback step finds no `:previous` and bails. Verify:
- Telegram message: `🔥 ci-deploy ROLLBACK FAILED — site is DOWN`
- `https://ui-dashboard.gnerim.ru/` returns 502.
- Manual recovery: `ssh pve-201 'docker stop flights-web 2>/dev/null; docker rm flights-web 2>/dev/null; docker run -d --name flights-web --restart unless-stopped -p 127.0.0.1:3002:8080 flights-web:<known-good-sha>'`.
### B: blocked on A not green
Trigger Workflow B (manual or tag) for a SHA that has no green Workflow A run. Verify:
- Telegram message: `⚠️ release blocked — workflow ci-deploy is not green for <sha>`
- B exits early; nothing changes in GitLab.
## Manual recovery scenarios
### Workflow B succeeded but Jenkins build failed
GitLab is at the new commit; customer site is stale. Recovery:
1. Open Jenkins UI → check the failing build's console log
2. Fix the issue (in this repo if it's our bug, in customer's infra otherwise)
3. Push fix → Workflow A → Workflow B → trigger Jenkins again
### Container running but nginx returns 502
Check the bind:
```bash
ssh pve-201
docker ps --filter name=flights-web
curl -v http://127.0.0.1:3002/ # should return 200 (or whatever the SSR root returns)
sudo nginx -t && sudo systemctl reload nginx
```
If the container died, the Restart policy `unless-stopped` should bring it back. If not:
```bash
docker logs flights-web --tail 200
docker stop flights-web 2>/dev/null; docker rm flights-web 2>/dev/null
docker run -d --name flights-web --restart unless-stopped -p 127.0.0.1:3002:8080 flights-web:current
```
### TIM tunnel is down (502 on /api/* but / works)
```bash
sudo systemctl status flights-tim-tunnel.service --no-pager
sudo journalctl -u flights-tim-tunnel.service -n 50 --no-pager
sudo systemctl restart flights-tim-tunnel.service
ss -ltn | grep ':8443\b' # confirm listener is back
```
If the tunnel won't come up, verify SSH key is still authorised on webzavod and that webzavod's `ppp0` is up (`ssh webzavod 'ip -br addr show ppp0'`).