7 Commits

Author SHA1 Message Date
gnezim f56bb97e68 nginx: extend HTML no-cache filter to /api/ (not just /api/dictionary/)
ci-deploy / build-deploy-test (push) Failing after 1m11s
Run 546 surfaced the second half of the cache-poisoning bug. /api/health
(which goes through the /api/ location, not /api/dictionary/) showed
`x-cache-status: STALE` text/html — meaning nginx had cached the WAF
HTML block page as a 200 entry, then served it via proxy_cache_use_stale
when the upstream returned 403 on a fresh fetch. The browser saw
text/html for an endpoint that should be JSON, console-gate flagged the
fail, and 5+ specs broke despite /api/dictionary/* being healthy.

Fix is the same one-liner already applied to /api/dictionary/: require
$no_cache_html (set in flights-api-cache.conf based on upstream's
Content-Type) so HTML responses are never stored. Future WAF spasms
return 403 directly to the client instead of dispensing months-old
poisoned HTML.
2026-04-28 13:13:31 +03:00
gnezim 39ade0102a ci: validate /api dictionary bodies in pre-warm + nginx cache hardening
Run 544 failed because the /api/dictionary/* nginx cache had been
poisoned with the upstream WAF's HTML block page (HTTP 200 + text/html,
"Доступ к сайту временно ограничен"). The previous pre-warm step only
checked %{http_code}, so the WAF response looked valid and got cached
for the full 6h TTL — every subsequent SSR render then resolved city
names via that HTML, breadcrumbs showed raw IATA codes, and 7 schedule
e2e specs failed.

Three changes that together close this hole:

1. ci-deploy pre-warm: two-step warm with body validation. Step 1 is
   a cache-bust query (?_=ns timestamp) that proves upstream is healthy
   independent of nginx cache. Step 2 fetches the canonical URL and
   validates the response is JSON (starts with [/{ and is >1KB). If
   the canonical body is HTML, retry once with `Cache-Control:
   no-cache` to force a fresh upstream fetch (works once the matching
   nginx config below is deployed); if still HTML, fail loudly with a
   manual-purge instruction so the operator can rm the cache files.

2. nginx /api/dictionary/ location: add `proxy_cache_bypass
   $http_cache_control` so the CI workflow can force-refresh on demand,
   and `proxy_no_cache $no_cache_html` so HTML responses are never
   stored in the first place.

3. flights-api-cache.conf: add `map $upstream_http_content_type
   $no_cache_html` that flips to "1" when upstream returns text/html.
   Drives the `proxy_no_cache` filter above.

Note: the nginx changes only take effect after setup-pve201.sh is
re-run on pve-201. Until then, any cache poisoning still stays poisoned
until the 6h TTL expires (or manual purge).
2026-04-28 11:58:04 +03:00
gnezim 3c6fa81d33 ci: pre-warm dictionary cache + give /api/dictionary 6h TTL
Adds a workflow step that fetches the four dictionary endpoints
(world_regions, countries, cities, airports — see api.ts) before
playwright runs. With the longer 6h TTL on /api/dictionary, every
e2e spec hits cache for the same 4 URLs that drive most of the
data-driven tests (breadcrumb city names, etc).

2s sleeps between warm-up calls keep the cold-cache pass under the
WAF rate-limit window.
2026-04-27 17:26:27 +03:00
gnezim b0e9aafed2 WAF rate-limit mitigation: nginx /api cache + Playwright throttle
(A) Add proxy_cache zone for ui-dashboard.gnerim.ru. /api/ caches 200 for
1m, /map/api/ for 24h. proxy_cache_use_stale serves cached content during
upstream errors (incl. 403 from WAF rate limit). proxy_cache_lock collapses
concurrent fetches for the same URI. Cache zone declared in conf.d/ (must
be in http{} context).

(B) Playwright workers=2, retries=2 in CI. Cuts the parallel burst that
trips the WAF before nginx cache warms up; retries handle the residual
flake.

setup-pve201.sh now installs the conf.d cache file and pre-creates the
cache dir with nginx-user ownership.
2026-04-27 16:40:44 +03:00
gnezim 03eeddfbf8 CI/CD pipeline: ssh -L tunnel for TIM API + manual Jenkins trigger
Two design pivots discovered during Phase B prerequisites:

Routing: Replace static-route + NAT plan with persistent ssh -L tunnel
from pve-201 to webzavod (deployment/systemd/flights-tim-tunnel.service).
nginx proxies /api/ and /map/api/ to https://127.0.0.1:8443 with SNI/Host
overrides so cert validation still targets the real hostname. No webzavod
kernel changes (no ip_forward/MASQUERADE), no /etc/hosts pin needed.

Workflow B: Drop Jenkins trigger/poll automation (operator lacks Jenkins
job-configure access and user API token access). release.yml now stops
after MR merge with a Telegram message containing the Jenkins job URL.
release-verify.yml (new, workflow_dispatch only) runs the customer-URL
e2e suite once the operator has triggered Jenkins manually and it has
completed.

Other:
- SSR loopback port 8081 -> 3002 (8081 was taken by openwebui on pve-201)
- notify-telegram.sh skips cleanly when TG secrets unset (was: hard-fail)
- README + spec addendum cover the new prereqs and removed steps
2026-04-27 11:58:39 +03:00
gnezim 0508f0f33d nginx: forward X-Forwarded-For on /api proxy blocks 2026-04-25 02:10:34 +03:00
gnezim 21a2acdb89 deployment: add nginx vhost for ui-dashboard.gnerim.ru 2026-04-25 01:57:31 +03:00