InsightBase has a two-tier data model: a single shared canonical layer that holds the UK companies corpus, plus a per-workspace overlay for everything your team adds to those companies (notes, tags, segments, enrichment, watchlist).
Canonical layer
Tables prefixed ch_* and enr_canonical_* are platform-wide. Every authenticated user can read them. Writes are restricted to platform-admin paths (or controlled RPCs / triggers). Examples:
- ch_companies — 5.7M rows. Name, number, status, type, jurisdiction, dates, accounts, addresses (FK).
- ch_addresses — distinct registered-office addresses. Foreign-keyed from
ch_companiesviaregistered_office_address_id. - ch_company_sic_codes + ch_sic_codes — SIC code memberships and lookup.
- enr_canonical_facts — promoted enrichment facts (only after governance review).
Workspace layer
Tables prefixed workspace_*, company_*,segments, and the non-canonical enr_* tables. RLS-scoped by workspace_id. You see only your workspace's rows.
- company_notes, company_tags — free-form annotations.
- segments, segment_companies — saved searches and their members.
- company_watchlist — companies promoted to higher refresh priority.
- enr_* (non-canonical) — workspace enrichment tasks, candidates, evidence, AI summaries.
Freshness signals
Each ch_companies row carries:
- first_seen_at — when InsightBase first ingested this company.
- last_source_refresh_at — last time we hit
/company/{n}and got fresh data. - last_bulk_seen_at — last time the monthly bulk snapshot included this company.
- freshness_status — derived: fresh, stale, unknown.
- refresh_priority — drives cadence: critical 6h, high 24h, normal 7d, low 30d.
Why this split
Sharing the CH corpus is the only way to keep storage and refresh costs sane — duplicating 5.7M rows per workspace would be wasteful and would cause every workspace to drift independently. Workspace overlays let teams own their interpretation (segments, tags, enrichment) without contaminating the canonical truth.