Daily ingestion — how new companies arrive

Every UK company incorporated yesterday lands in InsightBase by the next morning, verified against Companies House.

6 min readOperations

The daily-feed rail brings every newly incorporated UK company into the canonical store. It runs at 00:00 London time and is verified set-equal against Companies House search before the day is locked.

The flow, end-to-end

  1. Discovery (00:00–00:30) — pulls every company incorporated yesterday from the CH advanced-search API. Lands inch_day_expected_numbers as the truth set.
  2. Ingestion (00:00–02:00) — for each expected company, fetch /company/{number} from CH, normalise, upsert into ch_companies. Failed fetches go into a staged retry queue.
  3. Verification (varies) — onceset_equal=true (every expected company is canonical), the day is marked audit_locked.
  4. Audit + digest — operational alerts emit if the day stays in verification_pending beyond the SLA window. A 10:00 London digest summarises the previous 24h to the platform team.

Healthy-day signals

  • status = completed
  • daily_feed_state = audit_locked
  • verification_status = verified
  • set_equal = true
  • missing_count = 0

When something goes wrong

The watchdog tracks two failure modes:

  • Stall — a run is marked slow after 180s without progress, stalled after 600s, severe after 1200s. The alert auto-clears once progress resumes.
  • Reconciliation mismatch — set-not-equal at verification time. The day stays in verification_pendinguntil the residual companies are re-fetched or marked as permanent gaps in ch_known_gaps.

Volumes you can expect

  • Average UK incorporations per weekday: 2,000–4,500
  • Spikes on Mondays (weekend backlog) and end-of-quarter
  • End-to-end runtime on a normal day: ~90 minutes from 00:00 to audit-locked

See also

Ready to use it?

Create an InsightBase account or sign in to open this workflow inside the product.