Run 546 surfaced the second half of the cache-poisoning bug. /api/health
(which goes through the /api/ location, not /api/dictionary/) showed
`x-cache-status: STALE` text/html — meaning nginx had cached the WAF
HTML block page as a 200 entry, then served it via proxy_cache_use_stale
when the upstream returned 403 on a fresh fetch. The browser saw
text/html for an endpoint that should be JSON, console-gate flagged the
fail, and 5+ specs broke despite /api/dictionary/* being healthy.
Fix is the same one-liner already applied to /api/dictionary/: require
$no_cache_html (set in flights-api-cache.conf based on upstream's
Content-Type) so HTML responses are never stored. Future WAF spasms
return 403 directly to the client instead of dispensing months-old
poisoned HTML.
Run 544 failed because the /api/dictionary/* nginx cache had been
poisoned with the upstream WAF's HTML block page (HTTP 200 + text/html,
"Доступ к сайту временно ограничен"). The previous pre-warm step only
checked %{http_code}, so the WAF response looked valid and got cached
for the full 6h TTL — every subsequent SSR render then resolved city
names via that HTML, breadcrumbs showed raw IATA codes, and 7 schedule
e2e specs failed.
Three changes that together close this hole:
1. ci-deploy pre-warm: two-step warm with body validation. Step 1 is
a cache-bust query (?_=ns timestamp) that proves upstream is healthy
independent of nginx cache. Step 2 fetches the canonical URL and
validates the response is JSON (starts with [/{ and is >1KB). If
the canonical body is HTML, retry once with `Cache-Control:
no-cache` to force a fresh upstream fetch (works once the matching
nginx config below is deployed); if still HTML, fail loudly with a
manual-purge instruction so the operator can rm the cache files.
2. nginx /api/dictionary/ location: add `proxy_cache_bypass
$http_cache_control` so the CI workflow can force-refresh on demand,
and `proxy_no_cache $no_cache_html` so HTML responses are never
stored in the first place.
3. flights-api-cache.conf: add `map $upstream_http_content_type
$no_cache_html` that flips to "1" when upstream returns text/html.
Drives the `proxy_no_cache` filter above.
Note: the nginx changes only take effect after setup-pve201.sh is
re-run on pve-201. Until then, any cache poisoning still stays poisoned
until the 6h TTL expires (or manual purge).
Adds a workflow step that fetches the four dictionary endpoints
(world_regions, countries, cities, airports — see api.ts) before
playwright runs. With the longer 6h TTL on /api/dictionary, every
e2e spec hits cache for the same 4 URLs that drive most of the
data-driven tests (breadcrumb city names, etc).
2s sleeps between warm-up calls keep the cold-cache pass under the
WAF rate-limit window.
(A) Add proxy_cache zone for ui-dashboard.gnerim.ru. /api/ caches 200 for
1m, /map/api/ for 24h. proxy_cache_use_stale serves cached content during
upstream errors (incl. 403 from WAF rate limit). proxy_cache_lock collapses
concurrent fetches for the same URI. Cache zone declared in conf.d/ (must
be in http{} context).
(B) Playwright workers=2, retries=2 in CI. Cuts the parallel burst that
trips the WAF before nginx cache warms up; retries handle the residual
flake.
setup-pve201.sh now installs the conf.d cache file and pre-creates the
cache dir with nginx-user ownership.
Two design pivots discovered during Phase B prerequisites:
Routing: Replace static-route + NAT plan with persistent ssh -L tunnel
from pve-201 to webzavod (deployment/systemd/flights-tim-tunnel.service).
nginx proxies /api/ and /map/api/ to https://127.0.0.1:8443 with SNI/Host
overrides so cert validation still targets the real hostname. No webzavod
kernel changes (no ip_forward/MASQUERADE), no /etc/hosts pin needed.
Workflow B: Drop Jenkins trigger/poll automation (operator lacks Jenkins
job-configure access and user API token access). release.yml now stops
after MR merge with a Telegram message containing the Jenkins job URL.
release-verify.yml (new, workflow_dispatch only) runs the customer-URL
e2e suite once the operator has triggered Jenkins manually and it has
completed.
Other:
- SSR loopback port 8081 -> 3002 (8081 was taken by openwebui on pve-201)
- notify-telegram.sh skips cleanly when TG secrets unset (was: hard-fail)
- README + spec addendum cover the new prereqs and removed steps