ci: validate /api dictionary bodies in pre-warm + nginx cache hardening

Run 544 failed because the /api/dictionary/* nginx cache had been
poisoned with the upstream WAF's HTML block page (HTTP 200 + text/html,
"Доступ к сайту временно ограничен"). The previous pre-warm step only
checked %{http_code}, so the WAF response looked valid and got cached
for the full 6h TTL — every subsequent SSR render then resolved city
names via that HTML, breadcrumbs showed raw IATA codes, and 7 schedule
e2e specs failed.

Three changes that together close this hole:

1. ci-deploy pre-warm: two-step warm with body validation. Step 1 is
   a cache-bust query (?_=ns timestamp) that proves upstream is healthy
   independent of nginx cache. Step 2 fetches the canonical URL and
   validates the response is JSON (starts with [/{ and is >1KB). If
   the canonical body is HTML, retry once with `Cache-Control:
   no-cache` to force a fresh upstream fetch (works once the matching
   nginx config below is deployed); if still HTML, fail loudly with a
   manual-purge instruction so the operator can rm the cache files.

2. nginx /api/dictionary/ location: add `proxy_cache_bypass
   $http_cache_control` so the CI workflow can force-refresh on demand,
   and `proxy_no_cache $no_cache_html` so HTML responses are never
   stored in the first place.

3. flights-api-cache.conf: add `map $upstream_http_content_type
   $no_cache_html` that flips to "1" when upstream returns text/html.
   Drives the `proxy_no_cache` filter above.

Note: the nginx changes only take effect after setup-pve201.sh is
re-run on pve-201. Until then, any cache poisoning still stays poisoned
until the 6h TTL expires (or manual purge).
This commit is contained in:
2026-04-28 11:58:04 +03:00
parent 36bb2d970f
commit 39ade0102a
3 changed files with 78 additions and 9 deletions
+58 -9
View File
@@ -123,17 +123,66 @@ jobs:
env:
BASIC_AUTH_USER: ${{ secrets.BASIC_AUTH_USER }}
BASIC_AUTH_PASS: ${{ secrets.BASIC_AUTH_PASS }}
# Two-step warm with body validation. Run 544 was bitten by cache
# poisoning: the upstream WAF returned its HTML block-page with
# HTTP 200, the previous prewarm only checked %{http_code}, so
# nginx happily cached the HTML as a valid 200 for 6h and every
# subsequent dictionary read returned HTML instead of JSON.
#
# Step 1: validate upstream via cache-bust query (`?_=<ts>` lands
# on a unique nginx cache key, forcing an upstream fetch).
# Step 2: warm + validate the canonical URL. If the canonical
# response is HTML, attempt one cache-bypass retry
# (`Cache-Control: no-cache` — works after the matching
# nginx config update). If still HTML, fail loudly with a
# purge instruction so the operator can clear cache.
run: |
# The four dictionary endpoints (see src/shared/dictionaries/api.ts)
# are read by every page load — fetch them once before e2e to warm
# nginx's proxy_cache. Subsequent e2e fetches hit the cache instead
# of the upstream WAF, which has a low per-source-IP rate limit.
# Brief sleep between requests to avoid tripping the WAF on the
# cold-cache pass.
set -euo pipefail
is_json() {
local body="$1"
local first_byte=${body:0:1}
[ "$first_byte" = "[" ] || [ "$first_byte" = "{" ] || return 1
[ ${#body} -gt 1024 ] || return 1
}
fail_with_body() {
local label="$1" body="$2"
echo "::error::pre-warm failed: $label" >&2
echo "first 200 bytes of body:" >&2
printf '%s\n' "${body:0:200}" >&2
exit 1
}
for path in world_regions countries cities airports; do
url="https://ui-dashboard.gnerim.ru/api/dictionary/1/${path}"
rc=$(curl -k -sS -u "$BASIC_AUTH_USER:$BASIC_AUTH_PASS" -o /dev/null -w "%{http_code}" "$url")
echo "warm $path -> HTTP $rc"
base="https://ui-dashboard.gnerim.ru/api/dictionary/1/${path}"
# Step 1: prove upstream is healthy (cache-bust via query).
bust_url="${base}?_=$(date +%s%N)"
bust_body=$(curl -k -sS -u "$BASIC_AUTH_USER:$BASIC_AUTH_PASS" \
--max-time 15 "$bust_url")
if ! is_json "$bust_body"; then
fail_with_body "${path} upstream returned non-JSON (WAF rate-limit?)" "$bust_body"
fi
# Step 2: warm + validate canonical URL.
cano_body=$(curl -k -sS -u "$BASIC_AUTH_USER:$BASIC_AUTH_PASS" \
--max-time 15 "$base")
if ! is_json "$cano_body"; then
# Canonical hit poisoned cache. Force-refresh once via
# `Cache-Control: no-cache` (proxy_cache_bypass on the
# /api/dictionary/ location forwards to upstream, then
# stores the fresh response).
cano_body=$(curl -k -sS -u "$BASIC_AUTH_USER:$BASIC_AUTH_PASS" \
-H "Cache-Control: no-cache" \
--max-time 15 "$base")
if ! is_json "$cano_body"; then
echo "::error::cache poisoned for ${path} — Cache-Control: no-cache did not refresh" >&2
echo "::error::manual purge: ssh pve-201 'rm -rf /var/cache/nginx/flights-api/* && systemctl reload nginx'" >&2
fail_with_body "${path} canonical URL still non-JSON after bypass" "$cano_body"
fi
echo "warm $path -> ok via cache-bypass (cache had been poisoned, now refreshed; ${#cano_body} bytes)"
else
echo "warm $path -> ok (${#cano_body} bytes)"
fi
sleep 2
done
echo "--- verify cache HIT on a re-fetch ---"