Platform admin operations live under /admin. This topic is the operator playbook for keeping the rails healthy and recovering from the failure modes we've seen in prod.
The cron map
30+ pg_cron jobs drive the platform. Grouped by purpose:
- Daily-feed rail — discovery, ingest, finalize, row reaper, staged recovery, manifest reconciler, verification watchdog.
- Stream consumer — every 2 min, drains the CH stream API into
evt_stream_events. - Profile refresh — every minute, claims a batch of due-for-refresh companies and pulls fresh
/company/{n}data. - Notifications — events, deliveries, sequences, escalations, triggers, digests, plus CH-specific notif bridge and email sender.
- Lifecycle — bounded refresh, scores recompute, signals generation, signal expiry.
- CH bulk — monthly bulk-seed (manual trigger), bulk people sync, daily-feed digest.
Reading rail health
Two health tables sit behind the dashboards:
- ingest_run_health_checks — every 2 min, snapshot of every running daily-feed run with current processed-rows, rows-delta-since-last-check, and stall-seconds.
- ingest_run_health_alerts — open alerts. Each row has
severity(slow/stalled/severe/warning/critical),opened_at,last_seen_at, andresolution. Cleared alerts get a resolution string (completed = run finished, progressed / recovered = stall ended,auto: … = a rule-based watchdog cleared it).
Recovery actions
For days that won't auto-finalise:
- From the day card, click Run again — re-dispatches the ingest. Most recoverable issues clear here.
- If the day is set-not-equal: open the residual list, decide if the missing companies are recoverable (re-fetch) or permanent gaps (mark in
ch_known_gaps). - If a run is stuck in verification_pending with
set_equal=falsefor > 60min and the data is in place, manually finalise viafn_finalize_day_verification.
Bulk seed
The monthly Companies House BasicCompanyData snapshot is ~5M companies. The seeder runs from a Railway worker. Key gotchas: must connect via the IPv4 session pooler (port 5432), and the merge runs as a batched loop (one batch per RPC call) to avoid OOM.
Profile refresh — performance notes
The claim function uses a partial index on ch_companies(next_refresh_due_at) WHERE next_refresh_due_at IS NOT NULL. Design throughput is 30/min via 500ms pacing. Watch the CH 429 rate — if it climbs above ~5%, drop the limit by a third.