03eeddfbf8
Two design pivots discovered during Phase B prerequisites: Routing: Replace static-route + NAT plan with persistent ssh -L tunnel from pve-201 to webzavod (deployment/systemd/flights-tim-tunnel.service). nginx proxies /api/ and /map/api/ to https://127.0.0.1:8443 with SNI/Host overrides so cert validation still targets the real hostname. No webzavod kernel changes (no ip_forward/MASQUERADE), no /etc/hosts pin needed. Workflow B: Drop Jenkins trigger/poll automation (operator lacks Jenkins job-configure access and user API token access). release.yml now stops after MR merge with a Telegram message containing the Jenkins job URL. release-verify.yml (new, workflow_dispatch only) runs the customer-URL e2e suite once the operator has triggered Jenkins manually and it has completed. Other: - SSR loopback port 8081 -> 3002 (8081 was taken by openwebui on pve-201) - notify-telegram.sh skips cleanly when TG secrets unset (was: hard-fail) - README + spec addendum cover the new prereqs and removed steps
180 lines
7.7 KiB
Markdown
180 lines
7.7 KiB
Markdown
# pve-201 Deployment Runbook
|
|
|
|
This is the bootstrap procedure for hosting `https://ui-dashboard.gnerim.ru/` on pve-201, plus rehearsal recipes for the CI/CD pipeline failure paths. The full design rationale lives in `docs/superpowers/specs/2026-04-25-cicd-pipeline-design.md`.
|
|
|
|
## One-time setup
|
|
|
|
### 1. SSH tunnel pve-201 → webzavod (TIM API access)
|
|
|
|
The customer WAF on `flights.test.aeroflot.ru` only accepts requests from corp-VPN egress IPs. nginx proxies `/api/` and `/map/api/` to `https://127.0.0.1:8443`, which is forwarded over SSH to webzavod (which terminates the corp VPN on `ppp0`). A systemd unit keeps the tunnel up.
|
|
|
|
**On webzavod (192.168.88.58)** — append the pve-201 pubkey to `~gnezim/.ssh/authorized_keys` with `permitopen` restricting it to one host:port (one-time, read pve-201's `~gnezim/.ssh/id_rsa.pub` first):
|
|
|
|
```
|
|
command="exit 1",no-pty,no-X11-forwarding,no-agent-forwarding,no-user-rc,permitopen="flights.test.aeroflot.ru:443" ssh-rsa AAAA…== pve-201-flights-tim-tunnel
|
|
```
|
|
|
|
**On pve-201** — install + enable the systemd unit:
|
|
|
|
```bash
|
|
cd /path/to/Aeroflot.Flights.Web
|
|
sudo cp deployment/systemd/flights-tim-tunnel.service /etc/systemd/system/
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl enable --now flights-tim-tunnel.service
|
|
sudo systemctl status flights-tim-tunnel.service --no-pager
|
|
```
|
|
|
|
**Smoke test:**
|
|
|
|
```bash
|
|
ss -ltn | grep ':8443\b' # expect: a 127.0.0.1:8443 LISTEN line
|
|
curl -k --resolve flights.test.aeroflot.ru:8443:127.0.0.1 \
|
|
-o /dev/null -w 'swagger: %{http_code}\n' \
|
|
https://flights.test.aeroflot.ru:8443/swagger/index.html # expect 401
|
|
curl -k --resolve flights.test.aeroflot.ru:8443:127.0.0.1 \
|
|
-o /dev/null -w 'api/health: %{http_code}\n' \
|
|
https://flights.test.aeroflot.ru:8443/api/health # expect 200
|
|
```
|
|
|
|
If swagger returns 200 with HTML body instead of 401, the tunnel is bypassed and the request egressed directly — fix the listener / SSH unit before proceeding.
|
|
|
|
### 2. nginx vhost
|
|
|
|
```bash
|
|
cd /path/to/Aeroflot.Flights.Web
|
|
sudo cp deployment/nginx/ui-dashboard.gnerim.ru.conf /etc/nginx/sites-available/
|
|
sudo ln -sf /etc/nginx/sites-available/ui-dashboard.gnerim.ru.conf /etc/nginx/sites-enabled/
|
|
sudo mkdir -p /etc/nginx/htpasswd
|
|
sudo nginx -t
|
|
sudo systemctl reload nginx
|
|
```
|
|
|
|
The `htpasswd` file is created by `scripts/ci/install-htpasswd.sh` on first deploy.
|
|
|
|
### 3. Gitea runner setup
|
|
|
|
The runner must be in the `docker` group (so it can talk to the Docker socket without sudo) and reach all upstream services:
|
|
|
|
```bash
|
|
sudo usermod -aG docker <runner-user> # then re-login the runner service
|
|
docker ps # must work without sudo for the runner user
|
|
```
|
|
|
|
Reachability checks the runner must pass:
|
|
|
|
```bash
|
|
curl -fsS https://git.gnerim.ru/ # Gitea
|
|
curl -fsSI https://teamscore.gitlab.yandexcloud.net/ # GitLab
|
|
```
|
|
|
|
The customer Jenkins URL and the customer site (`flights-ui.devwebzavod.ru`) are NOT reachable from the runner directly — Workflow B does not call them. Customer-side e2e (Workflow C, `release-verify`) only runs after the operator has manually triggered the Jenkins build, and it reaches the customer URL the same way the upstream API is reached: direct egress where possible, or through additional tunnels added on demand.
|
|
|
|
### 4. GitLab Personal Access Token
|
|
|
|
GitLab → User Settings → Access Tokens → create with scopes `api` and `write_repository`. Store as Gitea Actions secret `GITLAB_PAT`.
|
|
|
|
### 5. Allow self-approve on GitLab project
|
|
|
|
GitLab → flights-front project → Settings → Merge requests → Approval rules → uncheck **"Prevent approval by author"** (skip if you can already approve your own MRs in the GitLab UI).
|
|
|
|
Verify by running (locally, after PAT is in place):
|
|
|
|
```bash
|
|
GITLAB_PAT=<pat> ./scripts/ci/check-gitlab-project.sh
|
|
```
|
|
|
|
It prints the numeric project ID (store as `GITLAB_PROJECT_ID` secret) and confirms self-approve is allowed.
|
|
|
|
### 6. Telegram bot (optional)
|
|
|
|
Use existing bot or create via @BotFather. Get the chat_id by sending a message and querying `https://api.telegram.org/bot<TOKEN>/getUpdates`. Store as `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHAT_ID`.
|
|
|
|
If either secret is unset, all `notify-telegram.sh` calls in the workflows skip cleanly with no error — the pipeline runs end-to-end without Telegram configured.
|
|
|
|
### 7. Gitea Actions secrets summary
|
|
|
|
Repo → Settings → Actions → Secrets — set all of:
|
|
|
|
| Secret | Required | Purpose |
|
|
|---|---|---|
|
|
| `BASIC_AUTH_USER`, `BASIC_AUTH_PASS` | yes | nginx htpasswd for `ui-dashboard.gnerim.ru` |
|
|
| `MAP_TILE_URL` | optional | Default `/map/api/tile/{z}/{x}/{y}.jpeg` |
|
|
| `API_BASE_URL` | optional | Default `/api` |
|
|
| `GITLAB_PAT`, `GITLAB_PROJECT_ID` | yes (release only) | GitLab MR API |
|
|
| `TELEGRAM_BOT_TOKEN`, `TELEGRAM_CHAT_ID` | optional | Notifications |
|
|
| `GITHUB_TOKEN` | auto | Provided by Gitea Actions — no manual setup required |
|
|
|
|
Jenkins is triggered manually after the release workflow merges to GitLab; no Jenkins secret is required.
|
|
|
|
## Verifying failure paths
|
|
|
|
Run at least the rollback and "release blocked" rehearsals once before declaring the pipeline production-grade.
|
|
|
|
### A: e2e fail → rollback
|
|
|
|
Push a commit that adds `console.error('rehearsal')` somewhere that runs on every page (e.g. `src/routes/layout.tsx`). Workflow A runs, e2e fails on the console-gate, rollback to `:previous` triggers. Verify:
|
|
|
|
- Telegram message: `❌ ci-deploy FAILED at step "Run Playwright e2e" — rolled back to <prev-sha>`
|
|
- `https://ui-dashboard.gnerim.ru/` still serves the previous version (check the page or `docker inspect flights-web`).
|
|
|
|
Revert the rehearsal commit when done.
|
|
|
|
### A: rollback itself fails
|
|
|
|
```bash
|
|
ssh pve-201 'docker rmi flights-web:previous'
|
|
```
|
|
|
|
Then push a commit that fails e2e. Rollback step finds no `:previous` and bails. Verify:
|
|
|
|
- Telegram message: `🔥 ci-deploy ROLLBACK FAILED — site is DOWN`
|
|
- `https://ui-dashboard.gnerim.ru/` returns 502.
|
|
- Manual recovery: `ssh pve-201 'docker stop flights-web 2>/dev/null; docker rm flights-web 2>/dev/null; docker run -d --name flights-web --restart unless-stopped -p 127.0.0.1:3002:8080 flights-web:<known-good-sha>'`.
|
|
|
|
### B: blocked on A not green
|
|
|
|
Trigger Workflow B (manual or tag) for a SHA that has no green Workflow A run. Verify:
|
|
|
|
- Telegram message: `⚠️ release blocked — workflow ci-deploy is not green for <sha>`
|
|
- B exits early; nothing changes in GitLab.
|
|
|
|
## Manual recovery scenarios
|
|
|
|
### Workflow B succeeded but Jenkins build failed
|
|
|
|
GitLab is at the new commit; customer site is stale. Recovery:
|
|
|
|
1. Open Jenkins UI → check the failing build's console log
|
|
2. Fix the issue (in this repo if it's our bug, in customer's infra otherwise)
|
|
3. Push fix → Workflow A → Workflow B → trigger Jenkins again
|
|
|
|
### Container running but nginx returns 502
|
|
|
|
Check the bind:
|
|
|
|
```bash
|
|
ssh pve-201
|
|
docker ps --filter name=flights-web
|
|
curl -v http://127.0.0.1:3002/ # should return 200 (or whatever the SSR root returns)
|
|
sudo nginx -t && sudo systemctl reload nginx
|
|
```
|
|
|
|
If the container died, the Restart policy `unless-stopped` should bring it back. If not:
|
|
|
|
```bash
|
|
docker logs flights-web --tail 200
|
|
docker stop flights-web 2>/dev/null; docker rm flights-web 2>/dev/null
|
|
docker run -d --name flights-web --restart unless-stopped -p 127.0.0.1:3002:8080 flights-web:current
|
|
```
|
|
|
|
### TIM tunnel is down (502 on /api/* but / works)
|
|
|
|
```bash
|
|
sudo systemctl status flights-tim-tunnel.service --no-pager
|
|
sudo journalctl -u flights-tim-tunnel.service -n 50 --no-pager
|
|
sudo systemctl restart flights-tim-tunnel.service
|
|
ss -ltn | grep ':8443\b' # confirm listener is back
|
|
```
|
|
|
|
If the tunnel won't come up, verify SSH key is still authorised on webzavod and that webzavod's `ppp0` is up (`ssh webzavod 'ip -br addr show ppp0'`).
|