- Enhanced wait-for-url.sh to capture HTTP status, response time, and size on failure
- Added full response capture in release-verify.yml for debugging customer URL issues
Run 549's wait-for-health logged two HTTP 502s before its third
attempt succeeded — nginx → docker forwarding hit the new container
during the ~4s window between \`docker run -d\` returning and
Node.js inside finishing its boot. The retry loop covered it but the
log was noisy and a slower boot could blow past the 30×2s budget.
Added a post-run readiness probe inside swap: poll
http://127.0.0.1:${PORT}/ on the host (docker container is published
to 127.0.0.1, runner uses host network mode) until it answers 2xx,
up to 30 attempts × 1s. Skipped under --dry-run so the tests/ci/
shell tests still pass without touching the network.
Net effect: wait-for-url against the public URL now succeeds first
attempt, and the run aborts cleanly if the SSR doesn't come up at
all instead of looking healthy because nginx happens to keep a
warmed connection.
Run 544's real cause was deeper than just "WAF rate-limit": the
upstream WAF (flights.test.aeroflot.ru) blocks the default curl UA
unconditionally, returning its HTML "Доступ временно ограничен"
page with HTTP 200. A genuine browser-like User-Agent (tested:
Chrome/120 on Linux) passes through and gets the real JSON.
Confirmed by direct upstream probe via the corp-VPN tunnel:
curl -A '<default>' → 3392b text/html (block page)
curl -A 'Mozilla/5.0 ...' → 28KB+ application/json (real data)
So every prior pre-warm "warmed" the WAF block page into the nginx
cache, and the runner was effectively never reaching the API. The
previous commit's body validation would now catch this — but only
to fail-fast, not to fix it. Real fix: send a browser UA.
Three places updated:
* scripts/ci/wait-for-url.sh — passes -A on every retry.
* ci-deploy.yml diagnose + pre-warm — UA shared via local var.
* release-verify.yml diagnose — same UA on customer-URL probes.
Note: the matching nginx config (proxy_no_cache $no_cache_html +
proxy_cache_bypass $http_cache_control on /api/dictionary/) was
deployed manually to pve-201 and verified — second hits now show
x-cache-status: HIT serving 28KB application/json. HTML responses
no longer get cached.
Two design pivots discovered during Phase B prerequisites:
Routing: Replace static-route + NAT plan with persistent ssh -L tunnel
from pve-201 to webzavod (deployment/systemd/flights-tim-tunnel.service).
nginx proxies /api/ and /map/api/ to https://127.0.0.1:8443 with SNI/Host
overrides so cert validation still targets the real hostname. No webzavod
kernel changes (no ip_forward/MASQUERADE), no /etc/hosts pin needed.
Workflow B: Drop Jenkins trigger/poll automation (operator lacks Jenkins
job-configure access and user API token access). release.yml now stops
after MR merge with a Telegram message containing the Jenkins job URL.
release-verify.yml (new, workflow_dispatch only) runs the customer-URL
e2e suite once the operator has triggered Jenkins manually and it has
completed.
Other:
- SSR loopback port 8081 -> 3002 (8081 was taken by openwebui on pve-201)
- notify-telegram.sh skips cleanly when TG secrets unset (was: hard-fail)
- README + spec addendum cover the new prereqs and removed steps
CI needs to sync to an arbitrary clone dir, not just the local sibling.
Extract the copy logic into sync-to-gitlab.sh (required target arg,
machine-friendly output); reduce sync-to-flights-front.sh to a thin
wrapper that supplies the local default and adds dev next-steps hints.