Sponsor licences — Home Office Register matching

131k+ sponsor licences ingested monthly. 73% match canonical companies via Phase-1 exact match; the rest go through Phase-2 fuzzy review.

5 min readOperations

InsightBase ingests the Home Office Register of Licensed Sponsors monthly and matches each licence to a canonical company. About 73% match cleanly via Phase-1 exact matching; the remainder go through Phase-2 fuzzy matching with operator review.

Phase 1 — exact match

On ingest, every licence row is canonicalised: company name is normalised through fn_normalize_company_name (lowercase, strip ltd|limited|plc|llp suffix, strip non-alphanumeric) and looked up against ch_companies.normalized_company_name. Matches set matched_company_id directly.

Phase 1 is fast (~1–2 min for 130k licences) and high-confidence. Currently matches about 96k of 131k licences (73%).

Phase 2 — fuzzy match

Anything Phase 1 didn't catch becomes a candidate for Phase 2. A DuckDB worker computes trigram similarity against the canonical normalized name set and emits candidate pairs above a threshold (currently 0.85) into sponsor_licence_match_candidates.

Operators review candidates and decide accept / reject / defer. The decision is recorded in sponsor_licence_match_candidates.statusand, on accept, the parent licence's matched_company_id is set.

Why some licences never match

  • Holding-company structures (licence held by parent, but data lists trading subsidiary)
  • Licences with typos or name changes the fuzzy threshold doesn't cover
  • Recently-incorporated companies absent from older snapshots
  • Sole traders, partnerships, or charities outside the CH corpus

Filter use cases

Once matched, sponsor data is available as filters and joinable in the Intelligence Studio:

  • Has sponsor licence — boolean
  • Sponsor licence rating — A, A-rated (premium), B
  • Sponsor licence type — Worker, Temporary worker, etc.
  • Route — Skilled worker, Health and care worker, etc.
  • Sponsor licence expiry — date filter

Data refresh cadence

  • Home Office publishes the register monthly (typically 1st of the month)
  • Phase-1 ingest runs nightly until everything is matched / classified
  • Phase-2 fuzzy match runs once per refresh, then on demand

See also

Ready to use it?

Create an InsightBase account or sign in to open this workflow inside the product.