17 Commits

Author SHA1 Message Date
gnezim 99562c2218 Proxy Aeroflot shell in standalone 2026-05-21 11:05:04 +03:00
gnezim f56bb97e68 nginx: extend HTML no-cache filter to /api/ (not just /api/dictionary/)
ci-deploy / build-deploy-test (push) Failing after 1m11s
Run 546 surfaced the second half of the cache-poisoning bug. /api/health
(which goes through the /api/ location, not /api/dictionary/) showed
`x-cache-status: STALE` text/html — meaning nginx had cached the WAF
HTML block page as a 200 entry, then served it via proxy_cache_use_stale
when the upstream returned 403 on a fresh fetch. The browser saw
text/html for an endpoint that should be JSON, console-gate flagged the
fail, and 5+ specs broke despite /api/dictionary/* being healthy.

Fix is the same one-liner already applied to /api/dictionary/: require
$no_cache_html (set in flights-api-cache.conf based on upstream's
Content-Type) so HTML responses are never stored. Future WAF spasms
return 403 directly to the client instead of dispensing months-old
poisoned HTML.
2026-04-28 13:13:31 +03:00
gnezim 39ade0102a ci: validate /api dictionary bodies in pre-warm + nginx cache hardening
Run 544 failed because the /api/dictionary/* nginx cache had been
poisoned with the upstream WAF's HTML block page (HTTP 200 + text/html,
"Доступ к сайту временно ограничен"). The previous pre-warm step only
checked %{http_code}, so the WAF response looked valid and got cached
for the full 6h TTL — every subsequent SSR render then resolved city
names via that HTML, breadcrumbs showed raw IATA codes, and 7 schedule
e2e specs failed.

Three changes that together close this hole:

1. ci-deploy pre-warm: two-step warm with body validation. Step 1 is
   a cache-bust query (?_=ns timestamp) that proves upstream is healthy
   independent of nginx cache. Step 2 fetches the canonical URL and
   validates the response is JSON (starts with [/{ and is >1KB). If
   the canonical body is HTML, retry once with `Cache-Control:
   no-cache` to force a fresh upstream fetch (works once the matching
   nginx config below is deployed); if still HTML, fail loudly with a
   manual-purge instruction so the operator can rm the cache files.

2. nginx /api/dictionary/ location: add `proxy_cache_bypass
   $http_cache_control` so the CI workflow can force-refresh on demand,
   and `proxy_no_cache $no_cache_html` so HTML responses are never
   stored in the first place.

3. flights-api-cache.conf: add `map $upstream_http_content_type
   $no_cache_html` that flips to "1" when upstream returns text/html.
   Drives the `proxy_no_cache` filter above.

Note: the nginx changes only take effect after setup-pve201.sh is
re-run on pve-201. Until then, any cache poisoning still stays poisoned
until the 6h TTL expires (or manual purge).
2026-04-28 11:58:04 +03:00
gnezim 5273b3a7a6 setup-pve201: treat WAF 403 as warning, not fatal
The smoke test was getting 403 from the upstream WAF (rate-limit on
webzavod's egress IP). 403 doesn't indicate a tunnel/routing problem
— it confirms the egress IP IS the WAF-recognized one and is being
throttled. Don't abort the rest of setup over a transient throttle;
the only response that should hard-fail is HTTP 200 with HTML body
(WAF interstitial), which means the tunnel was bypassed.
2026-04-27 17:37:22 +03:00
gnezim 3c6fa81d33 ci: pre-warm dictionary cache + give /api/dictionary 6h TTL
Adds a workflow step that fetches the four dictionary endpoints
(world_regions, countries, cities, airports — see api.ts) before
playwright runs. With the longer 6h TTL on /api/dictionary, every
e2e spec hits cache for the same 4 URLs that drive most of the
data-driven tests (breadcrumb city names, etc).

2s sleeps between warm-up calls keep the cold-cache pass under the
WAF rate-limit window.
2026-04-27 17:26:27 +03:00
gnezim b0e9aafed2 WAF rate-limit mitigation: nginx /api cache + Playwright throttle
(A) Add proxy_cache zone for ui-dashboard.gnerim.ru. /api/ caches 200 for
1m, /map/api/ for 24h. proxy_cache_use_stale serves cached content during
upstream errors (incl. 403 from WAF rate limit). proxy_cache_lock collapses
concurrent fetches for the same URI. Cache zone declared in conf.d/ (must
be in http{} context).

(B) Playwright workers=2, retries=2 in CI. Cuts the parallel burst that
trips the WAF before nginx cache warms up; retries handle the residual
flake.

setup-pve201.sh now installs the conf.d cache file and pre-creates the
cache dir with nginx-user ownership.
2026-04-27 16:40:44 +03:00
gnezim 894113e09d Add deployment/setup-pve201.sh — one-shot Phase B host bootstrap
Idempotent: installs systemd tunnel unit, smoke-tests it, writes the
nginx vhost + htpasswd, reloads nginx. Reads BASIC_AUTH_USER/PASS from
env (use sudo -E).
2026-04-27 12:06:32 +03:00
gnezim 03eeddfbf8 CI/CD pipeline: ssh -L tunnel for TIM API + manual Jenkins trigger
Two design pivots discovered during Phase B prerequisites:

Routing: Replace static-route + NAT plan with persistent ssh -L tunnel
from pve-201 to webzavod (deployment/systemd/flights-tim-tunnel.service).
nginx proxies /api/ and /map/api/ to https://127.0.0.1:8443 with SNI/Host
overrides so cert validation still targets the real hostname. No webzavod
kernel changes (no ip_forward/MASQUERADE), no /etc/hosts pin needed.

Workflow B: Drop Jenkins trigger/poll automation (operator lacks Jenkins
job-configure access and user API token access). release.yml now stops
after MR merge with a Telegram message containing the Jenkins job URL.
release-verify.yml (new, workflow_dispatch only) runs the customer-URL
e2e suite once the operator has triggered Jenkins manually and it has
completed.

Other:
- SSR loopback port 8081 -> 3002 (8081 was taken by openwebui on pve-201)
- notify-telegram.sh skips cleanly when TG secrets unset (was: hard-fail)
- README + spec addendum cover the new prereqs and removed steps
2026-04-27 11:58:39 +03:00
gnezim a6293d9d56 ci: surface Jenkins console URL on build-timeout + document GITHUB_TOKEN auto-secret 2026-04-25 03:11:37 +03:00
gnezim de5decce03 deployment: fix recovery commands + clarify rehearsal procedures 2026-04-25 02:18:04 +03:00
gnezim 1fbd8ef23f deployment: bootstrap runbook + failure-path rehearsals 2026-04-25 02:12:25 +03:00
gnezim 0508f0f33d nginx: forward X-Forwarded-For on /api proxy blocks 2026-04-25 02:10:34 +03:00
gnezim 21a2acdb89 deployment: add nginx vhost for ui-dashboard.gnerim.ru 2026-04-25 01:57:31 +03:00
gnezim d43bfb3fcb Fix URL truncation from bash \${VAR:=default} with braces
CI / ci (push) Failing after 31s
Deploy / build-and-deploy (push) Failing after 5s
Deployed build had MAP_TILE_URL truncated to 'https://.../tile/{z' —
Leaflet then URL-encoded it to '%7Bz' and fetched garbage tile paths.
Root cause: build-docker.sh used

    : "\${MAP_TILE_URL:=https://flights.test.aeroflot.ru/map/api/tile/{z}/{x}/{y}.jpeg}"

and bash parameter expansion terminates the default value at the
FIRST unescaped '}', leaving '{z' and discarding the rest. The env
passed to `pnpm build:standalone` was already truncated, so every
downstream step (base64 encode → HTML inject → client decode) faithfully
carried the broken value through.

Fix by moving the defaults to Dockerfile's ARG lines — ARG defaults
are plain strings, not shell-parsed — and simplify build-docker.sh to
only forward MAP_TILE_URL / API_BASE_URL as --build-arg when the
caller explicitly sets them. Quote the k8s env values for defensive
YAML hygiene as well.
2026-04-19 00:44:45 +03:00
gnezim 2e2c5c09ce Fix __ENV__ truncation; route API_BASE_URL through the same injection
CI / ci (push) Failing after 30s
Deploy / build-and-deploy (push) Failing after 6s
Two gaps blocked http://flights-ui.devwebzavod.ru/ru/flights-map:

1. The inline <script>window.__ENV__=...</script> was written with the
   Leaflet tile template ('/map/api/tile/{z}/{x}/{y}.jpeg') embedded
   directly. Rspack's html-plugin pre-processes the children string and
   ate the '{z}' placeholder, truncating the injected JS literal to
   '/map/api/tile/{z' — MAP_TILE_URL on the client ended up broken and
   getEnv() fell back to the default.

   Escape every '{'/'}' inside the stringified value as '\u007B'/'\u007D'.
   JS decodes the Unicode escapes back to '{}' at parse time; the html
   plugin's template engine sees no placeholders to eat. Object-literal
   braces outside the string stay raw (Unicode escapes aren't valid in
   operator positions in JS source).

2. API_BASE_URL was still hard-defaulting to 'http://localhost:8080/api',
   so every dictionary fetch on the deployed cluster died with
   ERR_CONNECTION_REFUSED. Thread API_BASE_URL through the same
   PUBLIC_ENV_KEYS/html.tags path as MAP_TILE_URL, add matching Docker
   ARG/ENV, and forward it in deployment/build-docker.sh + k8s manifest.

The devwebzavod default for both is https://flights.test.aeroflot.ru
— where the real Aeroflot ingress terminates /map/api/** and /api/**.
Prod keeps overriding with same-origin URLs.
2026-04-18 23:56:28 +03:00
gnezim ef85ae6ea1 Inject MAP_TILE_URL into window.__ENV__ via html.tags + Docker build-arg
CI / ci (push) Failing after 32s
Deploy / build-and-deploy (push) Failing after 6s
http://flights-ui.devwebzavod.ru/ru/flights-map was still hitting the
same-origin tile path after adding the k8s env: Modern.js renders the
<Suspense> fallback on the server (i18n isn't preloaded), so the route
component that reads getEnv() never actually runs during SSR. The page
hydrates client-side, where process.env is Rspack's empty stub and
MAP_TILE_URL is never set — getEnv() falls back to the default.

Move the value into window.__ENV__ instead:

- modern.config.ts: inline a <script> into html.tags that sets
  window.__ENV__ = { MAP_TILE_URL: <value> } at SSR-server startup.
  The snippet is authored once at server boot, so the HTML template
  baked into dist/standalone/html/main/index.html always carries the
  pod's tile URL.
- src/env/index.ts: merge window.__ENV__ on top of process.env so the
  browser prefers the injected value (process.env only has NODE_ENV
  after Rspack's polyfill).
- Dockerfile.react: accept MAP_TILE_URL as a build ARG and expose it
  as ENV before `pnpm build:standalone`, so Modern.js picks it up when
  building the HTML template. k8s env still flows into the Node SSR
  process; the build-arg path guarantees correctness even when the
  runtime env is stripped.
- deployment/build-docker.sh: forward MAP_TILE_URL through as a
  build-arg (default keeps the same-origin path). CI on the
  devwebzavod cluster can export MAP_TILE_URL=https://flights.test.aeroflot.ru/map/api/tile/{z}/{x}/{y}.jpeg
  before running build-docker.sh and the resulting image will serve
  tiles from the upstream the real Aeroflot ingress terminates.
2026-04-18 23:26:56 +03:00
gnezim 496a72e7d7 Track k8s manifest; teach make sync to mirror deployment/
CI / ci (push) Failing after 32s
Deploy / build-and-deploy (push) Failing after 5s
The flights-front deploy repo ships k8s manifests at deployment/k8s/,
a sibling of Aeroflot.Flights.Front/. Previously the sync script only
copied the app source, so any env change landed on the k8s side had
to be hand-edited in the deploy repo and was never reflected back.

- Bring deployment/k8s/flights-ui.yaml into this repo (with the new
  MAP_TILE_URL env pointing at flights.test.aeroflot.ru) so the
  cluster config lives next to the code that reads it.
- sync-to-flights-front.sh resolves the deploy-repo root from the
  target path and mirrors this repo's deployment/ directory there,
  mkdir'ing and copying contents without wiping unrelated files.
- Bump step numbering (1/6..6/6) and the summary now lists the synced
  deployment files in addition to the app files.
2026-04-18 22:51:26 +03:00