Data model — canonical, workspace, and provenance

Two-tier data: a shared canonical layer (the 5.7M-row CH corpus) plus a per-workspace overlay (notes, tags, segments, enrichment).

6 min readReference

InsightBase has a two-tier data model: a single shared canonical layer that holds the UK companies corpus, plus a per-workspace overlay for everything your team adds to those companies (notes, tags, segments, enrichment, watchlist).

Canonical layer

Tables prefixed ch_* and enr_canonical_* are platform-wide. Every authenticated user can read them. Writes are restricted to platform-admin paths (or controlled RPCs / triggers). Examples:

  • ch_companies — 5.7M rows. Name, number, status, type, jurisdiction, dates, accounts, addresses (FK).
  • ch_addresses — distinct registered-office addresses. Foreign-keyed from ch_companies via registered_office_address_id.
  • ch_company_sic_codes + ch_sic_codes — SIC code memberships and lookup.
  • enr_canonical_facts — promoted enrichment facts (only after governance review).

Workspace layer

Tables prefixed workspace_*, company_*,segments, and the non-canonical enr_* tables. RLS-scoped by workspace_id. You see only your workspace's rows.

  • company_notes, company_tags — free-form annotations.
  • segments, segment_companies — saved searches and their members.
  • company_watchlist — companies promoted to higher refresh priority.
  • enr_* (non-canonical) — workspace enrichment tasks, candidates, evidence, AI summaries.

Freshness signals

Each ch_companies row carries:

  • first_seen_at — when InsightBase first ingested this company.
  • last_source_refresh_at — last time we hit /company/{n} and got fresh data.
  • last_bulk_seen_at — last time the monthly bulk snapshot included this company.
  • freshness_status — derived: fresh, stale, unknown.
  • refresh_priority — drives cadence: critical 6h, high 24h, normal 7d, low 30d.

Why this split

Sharing the CH corpus is the only way to keep storage and refresh costs sane — duplicating 5.7M rows per workspace would be wasteful and would cause every workspace to drift independently. Workspace overlays let teams own their interpretation (segments, tags, enrichment) without contaminating the canonical truth.

See also

Ready to use it?

Create an InsightBase account or sign in to open this workflow inside the product.