Files
flights_web/.gitea/workflows/ci-deploy.yml
T
gnezim 23f8c82540
ci-deploy / build-deploy-test (push) Failing after 9m54s
ci: send browser User-Agent on every CI probe (WAF UA gate)
Run 544's real cause was deeper than just "WAF rate-limit": the
upstream WAF (flights.test.aeroflot.ru) blocks the default curl UA
unconditionally, returning its HTML "Доступ временно ограничен"
page with HTTP 200. A genuine browser-like User-Agent (tested:
Chrome/120 on Linux) passes through and gets the real JSON.

Confirmed by direct upstream probe via the corp-VPN tunnel:
  curl -A '<default>'  → 3392b text/html (block page)
  curl -A 'Mozilla/5.0 ...' → 28KB+ application/json (real data)

So every prior pre-warm "warmed" the WAF block page into the nginx
cache, and the runner was effectively never reaching the API. The
previous commit's body validation would now catch this — but only
to fail-fast, not to fix it. Real fix: send a browser UA.

Three places updated:

* scripts/ci/wait-for-url.sh — passes -A on every retry.
* ci-deploy.yml diagnose + pre-warm — UA shared via local var.
* release-verify.yml diagnose — same UA on customer-URL probes.

Note: the matching nginx config (proxy_no_cache $no_cache_html +
proxy_cache_bypass $http_cache_control on /api/dictionary/) was
deployed manually to pve-201 and verified — second hits now show
x-cache-status: HIT serving 28KB application/json. HTML responses
no longer get cached.
2026-04-28 12:26:48 +03:00

249 lines
10 KiB
YAML

name: ci-deploy
on:
push:
branches: [main]
workflow_dispatch:
# Single deploy at a time per host — pve-201's docker container name
# `flights-web` is a shared mutex. Without this, back-to-back pushes
# race on `docker stop / rm / run`, with the second run hitting
# "container name already in use". Queue, don't cancel.
concurrency:
group: ci-deploy-pve-201
cancel-in-progress: false
jobs:
build-deploy-test:
runs-on: ubuntu-latest
timeout-minutes: 30
env:
# MAP_TILE_URL / API_BASE_URL are intentionally NOT exported at job level —
# vitest validates them via Zod and rejects relative paths. Build args are
# set inline on the docker_build step instead.
BASIC_AUTH_USER: ${{ secrets.BASIC_AUTH_USER }}
BASIC_AUTH_PASS: ${{ secrets.BASIC_AUTH_PASS }}
TELEGRAM_BOT_TOKEN: ${{ secrets.TELEGRAM_BOT_TOKEN }}
TELEGRAM_CHAT_ID: ${{ secrets.TELEGRAM_CHAT_ID }}
FLIGHTS_WEB_PORT: '3002'
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Notify start
if: ${{ env.TELEGRAM_BOT_TOKEN != '' }}
run: scripts/ci/notify-telegram.sh start ci-deploy
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version-file: '.nvmrc'
- name: Setup pnpm
uses: pnpm/action-setup@v4
- name: Restore pnpm cache
uses: actions/cache@v4
with:
path: ~/.pnpm-store
key: pnpm-${{ hashFiles('pnpm-lock.yaml') }}
restore-keys: pnpm-
- name: Install dependencies
id: deps
run: pnpm install --frozen-lockfile
- name: Typecheck
id: typecheck
run: pnpm typecheck
- name: Lint
id: lint
run: pnpm lint
- name: Unit tests
id: unit
# tests/eslint/* are skipped in CI: typescript-eslint's project cache
# doesn't see runtime-generated probe files inside the runner container,
# though they pass locally. They're a dev-time eslint-config-drift guard
# and re-run on `pnpm test` locally before commit.
run: pnpm test -- --exclude 'tests/eslint/**'
- name: CI script tests
id: citest
run: pnpm test:ci
- name: Build SSR image
id: docker_build
env:
# Both must be full URLs — Zod's .url() validator in src/env/index.ts
# rejects relative paths at runtime in the browser. Same-origin works
# because the public host is also where nginx is.
MAP_TILE_URL: ${{ secrets.MAP_TILE_URL || 'https://ui-dashboard.gnerim.ru/map/api/tile/{z}/{x}/{y}.jpeg' }}
API_BASE_URL: ${{ secrets.API_BASE_URL || 'https://ui-dashboard.gnerim.ru/api' }}
run: |
docker build -f Dockerfile.react \
--build-arg "MAP_TILE_URL=${MAP_TILE_URL}" \
--build-arg "API_BASE_URL=${API_BASE_URL}" \
-t "flights-web:${GITHUB_SHA:0:7}" \
.
- name: Swap container
id: swap
run: scripts/ci/deploy-container.sh swap
- name: Wait for health
id: health
env:
BASIC_AUTH_USER: ${{ secrets.BASIC_AUTH_USER }}
BASIC_AUTH_PASS: ${{ secrets.BASIC_AUTH_PASS }}
run: scripts/ci/wait-for-url.sh https://ui-dashboard.gnerim.ru/ 30 2
- name: Diagnose tunnel reachability
id: tunnel_check
env:
BASIC_AUTH_USER: ${{ secrets.BASIC_AUTH_USER }}
BASIC_AUTH_PASS: ${{ secrets.BASIC_AUTH_PASS }}
# The upstream WAF blocks the default curl UA — every probe needs
# a browser-like User-Agent or it gets the HTML block page.
run: |
UA='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120 Safari/537.36'
echo "--- /api/health (expect 200 + x-envoy-upstream-service-time + x-cache-status) ---"
curl -k -sSI -A "$UA" -u "$BASIC_AUTH_USER:$BASIC_AUTH_PASS" https://ui-dashboard.gnerim.ru/api/health | head -15
echo "--- /api/dictionary/1/world_regions (expect JSON, ~5KB) ---"
curl -k -sS -A "$UA" -u "$BASIC_AUTH_USER:$BASIC_AUTH_PASS" \
-w "\n[size=%{size_download} time=%{time_total}s code=%{http_code}]\n" \
https://ui-dashboard.gnerim.ru/api/dictionary/1/world_regions | head -c 400; echo
echo "--- second hit on the same dict (expect HIT) ---"
curl -k -sSI -A "$UA" -u "$BASIC_AUTH_USER:$BASIC_AUTH_PASS" \
https://ui-dashboard.gnerim.ru/api/dictionary/1/world_regions | grep -iE "^HTTP|x-cache|x-envoy"
- name: Pre-warm /api cache (dictionaries shared across e2e specs)
id: cache_warmup
env:
BASIC_AUTH_USER: ${{ secrets.BASIC_AUTH_USER }}
BASIC_AUTH_PASS: ${{ secrets.BASIC_AUTH_PASS }}
# Two-step warm with body validation. Run 544 was bitten by cache
# poisoning: the upstream WAF returned its HTML block-page with
# HTTP 200, the previous prewarm only checked %{http_code}, so
# nginx happily cached the HTML as a valid 200 for 6h and every
# subsequent dictionary read returned HTML instead of JSON.
#
# Step 1: validate upstream via cache-bust query (`?_=<ts>` lands
# on a unique nginx cache key, forcing an upstream fetch).
# Step 2: warm + validate the canonical URL. If the canonical
# response is HTML, attempt one cache-bypass retry
# (`Cache-Control: no-cache` — works after the matching
# nginx config update). If still HTML, fail loudly with a
# purge instruction so the operator can clear cache.
run: |
set -euo pipefail
# The upstream WAF blocks the default curl UA — every fetch must
# send a browser-like User-Agent or it returns the HTML block page.
UA='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120 Safari/537.36'
is_json() {
local body="$1"
local first_byte=${body:0:1}
[ "$first_byte" = "[" ] || [ "$first_byte" = "{" ] || return 1
[ ${#body} -gt 1024 ] || return 1
}
fail_with_body() {
local label="$1" body="$2"
echo "::error::pre-warm failed: $label" >&2
echo "first 200 bytes of body:" >&2
printf '%s\n' "${body:0:200}" >&2
exit 1
}
for path in world_regions countries cities airports; do
base="https://ui-dashboard.gnerim.ru/api/dictionary/1/${path}"
# Step 1: prove upstream is healthy (cache-bust via query).
bust_url="${base}?_=$(date +%s%N)"
bust_body=$(curl -k -sS -A "$UA" -u "$BASIC_AUTH_USER:$BASIC_AUTH_PASS" \
--max-time 15 "$bust_url")
if ! is_json "$bust_body"; then
fail_with_body "${path} upstream returned non-JSON (WAF rate-limit?)" "$bust_body"
fi
# Step 2: warm + validate canonical URL.
cano_body=$(curl -k -sS -A "$UA" -u "$BASIC_AUTH_USER:$BASIC_AUTH_PASS" \
--max-time 15 "$base")
if ! is_json "$cano_body"; then
# Canonical hit poisoned cache. Force-refresh once via
# `Cache-Control: no-cache` (proxy_cache_bypass on the
# /api/dictionary/ location forwards to upstream, then
# stores the fresh response).
cano_body=$(curl -k -sS -A "$UA" -u "$BASIC_AUTH_USER:$BASIC_AUTH_PASS" \
-H "Cache-Control: no-cache" \
--max-time 15 "$base")
if ! is_json "$cano_body"; then
echo "::error::cache poisoned for ${path} — Cache-Control: no-cache did not refresh" >&2
echo "::error::manual purge: ssh pve-201 'rm -rf /var/cache/nginx/flights-api/* && systemctl reload nginx'" >&2
fail_with_body "${path} canonical URL still non-JSON after bypass" "$cano_body"
fi
echo "warm $path -> ok via cache-bypass (cache had been poisoned, now refreshed; ${#cano_body} bytes)"
else
echo "warm $path -> ok (${#cano_body} bytes)"
fi
sleep 2
done
echo "--- verify cache HIT on a re-fetch ---"
curl -k -sSI -A "$UA" -u "$BASIC_AUTH_USER:$BASIC_AUTH_PASS" \
https://ui-dashboard.gnerim.ru/api/dictionary/1/cities \
| grep -iE "^HTTP|x-cache-status"
- name: Install Playwright browsers
id: playwright_install
run: pnpm exec playwright install --with-deps chromium
- name: Run Playwright e2e
id: e2e
env:
BASE_URL: https://ui-dashboard.gnerim.ru
BASIC_AUTH_USER: ${{ secrets.BASIC_AUTH_USER }}
BASIC_AUTH_PASS: ${{ secrets.BASIC_AUTH_PASS }}
# Skip Angular↔React parity gaps + UI-behavior mismatches that
# need separate triage. release-verify runs the full suite.
CI_DEPLOY: '1'
run: pnpm test:e2e
- name: Rollback on failure (post-deploy steps)
if: failure() && (steps.swap.outcome == 'failure' || steps.health.outcome == 'failure' || steps.e2e.outcome == 'failure')
id: rollback
run: scripts/ci/deploy-container.sh rollback
- name: Capture container logs (on failure)
if: failure()
run: docker logs flights-web --tail 500 > container.log 2>&1 || true
- name: Upload artifacts on failure
if: failure()
uses: actions/upload-artifact@v3
with:
name: ci-deploy-failure-${{ github.run_id }}
path: |
container.log
playwright-report/
retention-days: 7
- name: Prune old images
if: success()
run: |
docker images flights-web --format '{{.Tag}} {{.ID}}' \
| grep -vE '^(current|previous)\b' \
| tail -n +6 \
| awk '{print $2}' \
| xargs -r docker rmi 2>/dev/null || true
- name: Notify (success)
if: success() && env.TELEGRAM_BOT_TOKEN != ''
run: scripts/ci/notify-telegram.sh ok ci-deploy
- name: Notify (failure)
if: failure() && env.TELEGRAM_BOT_TOKEN != ''
run: scripts/ci/notify-telegram.sh fail ci-deploy "see run for details" container.log