Kimball vs. Inmon: The Two Titans of Data Warehouse Architecture

 · 

17 min read

notion-image
When organizations set out to build an enterprise data warehouse (EDW), two foundational schools of thought dominate the landscape:
  • Kimball’s Bottom-Up Dimensional Approach
  • Inmon’s Top-Down Corporate Information Factory
Both methodologies aim to deliver trusted, consistent, and analytics-ready data, but they differ in philosophy, design flow, and implementation strategy. Let’s dive deep into their differences with examples and practical guidance.

Kimball Methodology – Bottom-Up Approach

The Kimball approach focuses on delivering fast business value by creating data marts organized around business processes.

Architectural Flow

Plain Text
[Source Systems]
     ↓
[Staging Area]
     ↓
[Data Marts (Star/Snowflake Schemas)]
     ↓
[BI / Reporting Tools]

Key Characteristics

  • Data Marts First: Each business process (Sales, Inventory, Finance) gets its own data mart.
  • Star Schema Design: Fact tables + Dimension tables, highly denormalized.
  • Conformed Dimensions: Shared dimensions (like Time, Customer) ensure consistency across marts.
  • Rapid Delivery: Business teams see value quickly with focused marts.
👉 Example: Star Schema for Sales
Plain Text
-- Fact table
CREATE TABLE Fact_Sales (
    Sale_ID       BIGINT PRIMARY KEY,
    Date_Key      INT,
    Product_Key   INT,
    Customer_Key  INT,
    Sales_Amount  DECIMAL(10,2),
    Quantity_Sold INT
);

-- Dimension table: Product
CREATE TABLE Dim_Product (
    Product_Key   INT PRIMARY KEY,
    Product_Name  VARCHAR(100),
    Category      VARCHAR(50),
    Price         DECIMAL(10,2)
);
👉 BI Query Example (Kimball-style):
Plain Text
SELECT
    p.Category,
    SUM(f.Sales_Amount) AS Total_Sales
FROM Fact_Sales f
JOIN Dim_Product p ON f.Product_Key = p.Product_Key
GROUP BY p.Category;
  • Fast results for business analytics.
  • Optimized for querying but not ideal for long-term historical detail.

Inmon Methodology – Top-Down Approach

The Inmon approach focuses on building a centralized EDW in 3NF (normalized form) before creating downstream data marts.

Architectural Flow

Plain Text
[Source Systems]
     ↓
[ETL / Integration Layer]
     ↓
[Enterprise Data Warehouse (3NF; normalized)]
     ↓
[Departmental Data Marts (denormalized)]
     ↓
[BI / Reporting Tools]

Key Characteristics

  • Enterprise Data Warehouse First: A single, corporate-wide EDW in normalized form.
  • Subject-Oriented: Data organized around entities like Customer, Product, Order.
  • Data Marts Later: Built off EDW as denormalized subsets for reporting.
  • Longer Setup: Slower initial delivery but consistent single source of truth.
👉 Example: Inmon EDW (3NF style)
Plain Text
-- Core entities in normalized structure
CREATE TABLE Customer (
    Customer_ID   INT PRIMARY KEY,
    First_Name    VARCHAR(50),
    Last_Name     VARCHAR(50),
    Email         VARCHAR(100)
);

CREATE TABLE Product (
    Product_ID    INT PRIMARY KEY,
    Name          VARCHAR(100),
    Category      VARCHAR(50)
);

CREATE TABLE Orders (
    Order_ID      INT PRIMARY KEY,
    Customer_ID   INT,
    Order_Date    DATE,
    FOREIGN KEY (Customer_ID) REFERENCES Customer(Customer_ID)
);
👉 BI Query Example (Inmon-style, via Data Mart):
Plain Text
SELECT
    c.Last_Name,
    SUM(o.Sales_Amount) AS Total_Spent
FROM Orders o
JOIN Customer c ON o.Customer_ID = c.Customer_ID
GROUP BY c.Last_Name;
  • Strong data consistency & governance.
  • EDW as the “single version of truth”.
  • Slower to deliver analytics-ready outputs compared to Kimball.

Side-by-Side Comparison

Aspect
Kimball (Bottom-Up)
Inmon (Top-Down)
Design Flow
Data marts first → EDW by integration
EDW first → data marts by extraction
Data Model
Denormalized (star/snowflake schemas)
Normalized (3NF) in EDW, marts denormalized
Speed to Value
Fast, iterative
Slower initial, but enterprise-wide
Data Consistency
Conformed dimensions
Centralized EDW as golden source
Flexibility
Easy to extend marts per business area
Harder upfront, but stable enterprise view
Best For
Agile BI, smaller orgs, fast delivery
Large enterprises, regulatory needs
Reporting
Queries hit data marts directly
Queries hit marts, not EDW directly

Visual Comparison Diagram

Kimball (Bottom-Up)

notion image
 
Plain Text
Sales Mart    Finance Mart   HR Mart
   │              │            │
   └──► Conformed Dimensions ◄─┘
           │
         BI Tools

Inmon (Top-Down)

notion image
Plain Text
Enterprise Data Warehouse (3NF)
            │
    ┌───────┴────────┐
  Sales Mart   Finance Mart
       │               │
     BI Tools        BI Tools

Hybrid Approaches

In practice, many organizations blend Kimball and Inmon:
  • Use Inmon-style EDW for governance & compliance.
  • Build Kimball-style marts for agile analytics.
  • Add Data Vault Modeling as a flexible middle layer for scalability.
👉 Modern architectures often look like:
Plain Text
[Raw Data Lake] → [Data Vault / EDW] → [Star Schemas / BI Marts] → [Reports / AI Models]

Business scenario & raw sources

We’re tracking retail sales for customers buying products.

Raw sources (Bronze / landing)

Plain Text
SRC_CUSTOMER(CustomerID, FirstName, LastName, Email, Status, UpdatedAt, SourceSystem)
SRC_PRODUCT(ProductID, Name, Category, ListPrice, UpdatedAt, SourceSystem)
SRC_ORDER_HEADER(OrderID, CustomerID, OrderDate, Status, SourceSystem, ExtractTS)
SRC_ORDER_LINE(OrderID, LineNumber, ProductID, Quantity, UnitPrice, DiscountPct, SourceSystem, ExtractTS)
Common downstream questions:
  1. Revenue by category / month
  1. Top customers by spend
  1. Order status pipeline (open vs shipped)
  1. Slowly changing attributes (customer status changes)

1) Kimball (bottom-up, star/snowflake)

mental model

deliver subject-area data marts quickly; enforce consistency via conformed dimensions; tables are denormalized for BI.
Plain Text
[Staging] → [Star/Snowflake Marts] → [BI]

schema (star)

Plain Text
          DimCustomer        DimProduct         DimDate
               │                 │                │
               └──────┐     ┌────┘                │
                      ▼     ▼                     │
                     FactSales  ◄─────────────────┘

tables (SQL)

Dimensions
Plain Text
CREATE TABLE DimCustomer (
  CustomerKey      INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  CustomerID       VARCHAR(50) NOT NULL,      -- natural key
  FullName         VARCHAR(150),
  Email            VARCHAR(200),
  Status           VARCHAR(30),
  StartDate        DATE NOT NULL,
  EndDate          DATE,
  IsCurrent        BOOLEAN NOT NULL,
  SourceSystem     VARCHAR(50)
);

CREATE TABLE DimProduct (
  ProductKey   INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  ProductID    VARCHAR(50) NOT NULL,
  Name         VARCHAR(150),
  Category     VARCHAR(100),
  ListPrice    DECIMAL(18,2),
  StartDate    DATE NOT NULL,
  EndDate      DATE,
  IsCurrent    BOOLEAN NOT NULL,
  SourceSystem VARCHAR(50)
);

CREATE TABLE DimDate (
  DateKey   INT PRIMARY KEY,   -- e.g., 20250131
  Date      DATE NOT NULL,
  Month     INT,
  MonthName VARCHAR(20),
  Year      INT,
  Quarter   INT
);
Fact
Plain Text
CREATE TABLE FactSales (
  SalesKey       BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  DateKey        INT NOT NULL REFERENCES DimDate(DateKey),
  CustomerKey    INT NOT NULL REFERENCES DimCustomer(CustomerKey),
  ProductKey     INT NOT NULL REFERENCES DimProduct(ProductKey),
  OrderID        VARCHAR(50),
  LineNumber     INT,
  Quantity       INT,
  UnitPrice      DECIMAL(18,2),
  DiscountPct    DECIMAL(5,2),
  SalesAmount    DECIMAL(18,2), -- Quantity * UnitPrice * (1-Discount)
  Status         VARCHAR(30),
  SourceSystem   VARCHAR(50),
  LoadTS         TIMESTAMP NOT NULL
);

load flow (ELT outline)

  1. Stage: copy raw → STG_* tables.
  1. SCD2 upsert to DimCustomer / DimProduct (track status, price changes).
  1. Derive measures & keys; insert FactSales with surrogate keys.
SCD2 example (DimCustomer)
Plain Text
-- Close out changed rows
UPDATE DimCustomer d
SET EndDate = CURRENT_DATE - 1, IsCurrent = FALSE
FROM Stg_Customer s
WHERE d.CustomerID = s.CustomerID
  AND d.IsCurrent = TRUE
  AND (d.Email <> s.Email OR d.Status <> s.Status);

-- Insert new current rows
INSERT INTO DimCustomer (CustomerID, FullName, Email, Status, StartDate, EndDate, IsCurrent, SourceSystem)
SELECT CustomerID,
       CONCAT(FirstName, ' ', LastName),
       Email,
       Status,
       CURRENT_DATE,
       NULL,
       TRUE,
       SourceSystem
FROM Stg_Customer s
LEFT JOIN DimCustomer d ON d.CustomerID = s.CustomerID AND d.IsCurrent = TRUE
WHERE d.CustomerID IS NULL
   OR d.Email <> s.Email
   OR d.Status <> s.Status;
Fact load (surrogate lookups)
Plain Text
INSERT INTO FactSales
(DateKey, CustomerKey, ProductKey, OrderID, LineNumber, Quantity, UnitPrice, DiscountPct, SalesAmount, Status, SourceSystem, LoadTS)
SELECT
  to_char(h.OrderDate, 'YYYYMMDD')::INT,
  dc.CustomerKey,
  dp.ProductKey,
  l.OrderID,
  l.LineNumber,
  l.Quantity,
  l.UnitPrice,
  l.DiscountPct,
  l.Quantity * l.UnitPrice * (1 - l.DiscountPct),
  h.Status,
  l.SourceSystem,
  CURRENT_TIMESTAMP
FROM Stg_Order_Line l
JOIN Stg_Order_Header h ON h.OrderID = l.OrderID
JOIN DimCustomer dc ON dc.CustomerID = h.CustomerID AND dc.IsCurrent = TRUE
JOIN DimProduct  dp ON dp.ProductID  = l.ProductID  AND dp.IsCurrent = TRUE;
BI query: monthly revenue by category
Plain Text
SELECT d.Year, d.Month, p.Category, SUM(f.SalesAmount) AS Revenue
FROM FactSales f
JOIN DimDate d   ON f.DateKey = d.DateKey
JOIN DimProduct p ON f.ProductKey = p.ProductKey
GROUP BY d.Year, d.Month, p.Category
ORDER BY d.Year, d.Month, p.Category;
dbt-style model (fact)
Plain Text
-- models/marts/f_sales.sql
{{ config(materialized='incremental', unique_key='SalesKey') }}
SELECT ...
FROM {{ ref('stg_order_line') }} l
JOIN {{ ref('stg_order_header') }} h ON h.orderid = l.orderid
JOIN {{ ref('dim_customer') }} dc ON ...
JOIN {{ ref('dim_product') }}  dp ON ...
{% if is_incremental() %}
WHERE l.extractts > (SELECT max(loadts) FROM {{ this }})
{% endif %}
pros: fast BI, easy analytics
watchouts: upstream changes can ripple into facts/dims; history depth limited by SCD design

2) Inmon (top-down, EDW-first 3NF)

mental model

build a normalized EDW as enterprise truth; publish data marts off it.
Plain Text
[Staging] → [EDW 3NF] → [Dimensional Data Marts] → [BI]

EDW schema (3NF)

Plain Text
Customer(CustomerID PK, ...status...)
Product(ProductID PK, ...price...)
OrderHeader(OrderID PK, CustomerID FK, OrderDate, Status)
OrderLine(OrderID FK, LineNumber, ProductID FK, Quantity, UnitPrice, DiscountPct)
EDW DDL (trimmed)
Plain Text
CREATE TABLE EDW_Customer (
  CustomerID    VARCHAR(50) PRIMARY KEY,
  FirstName     VARCHAR(80),
  LastName      VARCHAR(80),
  Email         VARCHAR(200),
  Status        VARCHAR(30),
  RowEffStart   TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  RowEffEnd     TIMESTAMP,
  RecordSource  VARCHAR(50)
);

CREATE TABLE EDW_Product (
  ProductID     VARCHAR(50) PRIMARY KEY,
  Name          VARCHAR(150),
  Category      VARCHAR(100),
  ListPrice     DECIMAL(18,2),
  RowEffStart   TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  RowEffEnd     TIMESTAMP,
  RecordSource  VARCHAR(50)
);

CREATE TABLE EDW_OrderHeader (
  OrderID       VARCHAR(50) PRIMARY KEY,
  CustomerID    VARCHAR(50) NOT NULL REFERENCES EDW_Customer(CustomerID),
  OrderDate     DATE NOT NULL,
  Status        VARCHAR(30),
  RecordSource  VARCHAR(50),
  RowLoadTS     TIMESTAMP NOT NULL
);

CREATE TABLE EDW_OrderLine (
  OrderID       VARCHAR(50) REFERENCES EDW_OrderHeader(OrderID),
  LineNumber    INT,
  ProductID     VARCHAR(50) REFERENCES EDW_Product(ProductID),
  Quantity      INT,
  UnitPrice     DECIMAL(18,2),
  DiscountPct   DECIMAL(5,2),
  RecordSource  VARCHAR(50),
  RowLoadTS     TIMESTAMP NOT NULL,
  PRIMARY KEY (OrderID, LineNumber)
);

publish dimensional mart

from the EDW

Plain Text
-- DimCustomer from EDW (SCD2 if needed)
INSERT INTO DimCustomer (...)
SELECT ...
FROM EDW_Customer
-- apply effective dating → SCD2 rows

-- FactSales from EDW tables
INSERT INTO FactSales (...)
SELECT ...
FROM EDW_OrderLine l
JOIN EDW_OrderHeader h ON h.OrderID = l.OrderID
JOIN DimCustomer dc ON dc.CustomerID = h.CustomerID AND dc.IsCurrent = TRUE
JOIN DimProduct  dp ON dp.ProductID  = l.ProductID  AND dp.IsCurrent = TRUE;
pros: enterprise consistency, de-duplication, strong governance
watchouts: longer time-to-BI; more modeling upfront

3) Data Vault 2.0 (agile, auditable backbone)

mental model

store all data, all the time, insert-only with metadata. separate business keys (Hubs), relationships (Links), and descriptive history (Satellites). then publish marts.
Plain Text
[Staging] → [Hubs/Links/Sats] → [PIT/Bridge] → [Star Marts] → [BI]

schema (mini vault)

Plain Text
HubCustomer       HubOrder        HubProduct
    │               │                │
    └──────┐   ┌────┘                │
           ▼   ▼                     │
        Link_OrderLine ──────────────┘
           │      │
           │      └─ Sat_OrderLine (qty, price, discount, status…)
           └─ Sat_OrderHeader (status, dates…)
Sat_Customer  Sat_Product
hashing helpers (Python)
Plain Text
import hashlib
def h(s): return hashlib.md5(s.encode('utf-8')).hexdigest()
def hk(*vals): return h('||'.join(v if v is not None else '' for v in vals))
Hubs
Plain Text
CREATE TABLE Hub_Customer (
  Customer_HK     CHAR(32) PRIMARY KEY,
  Customer_BK     VARCHAR(50) NOT NULL, -- business key
  LoadDTS         TIMESTAMP NOT NULL,
  RecordSource    VARCHAR(50) NOT NULL
);

CREATE TABLE Hub_Product (
  Product_HK      CHAR(32) PRIMARY KEY,
  Product_BK      VARCHAR(50) NOT NULL,
  LoadDTS         TIMESTAMP NOT NULL,
  RecordSource    VARCHAR(50) NOT NULL
);

CREATE TABLE Hub_Order (
  Order_HK        CHAR(32) PRIMARY KEY,
  Order_BK        VARCHAR(50) NOT NULL,
  LoadDTS         TIMESTAMP NOT NULL,
  RecordSource    VARCHAR(50) NOT NULL
);
Link (n-ary relationship: order line)
Plain Text
CREATE TABLE Link_OrderLine (
  OrderLine_HK    CHAR(32) PRIMARY KEY, -- hash of (Order_BK, LineNumber)
  Order_HK        CHAR(32) NOT NULL,
  Product_HK      CHAR(32) NOT NULL,
  Customer_HK     CHAR(32) NOT NULL,
  LoadDTS         TIMESTAMP NOT NULL,
  RecordSource    VARCHAR(50) NOT NULL
);
Satellites (context & history)
Plain Text
CREATE TABLE Sat_Customer (
  Customer_HK   CHAR(32) NOT NULL,
  HashDiff      CHAR(32) NOT NULL, -- hash of descriptive attrs
  FullName      VARCHAR(150),
  Email         VARCHAR(200),
  Status        VARCHAR(30),
  EffectiveDTS  TIMESTAMP NOT NULL,
  LoadDTS       TIMESTAMP NOT NULL,
  RecordSource  VARCHAR(50) NOT NULL,
  PRIMARY KEY (Customer_HK, EffectiveDTS)
);

CREATE TABLE Sat_Product (
  Product_HK    CHAR(32) NOT NULL,
  HashDiff      CHAR(32) NOT NULL,
  Name          VARCHAR(150),
  Category      VARCHAR(100),
  ListPrice     DECIMAL(18,2),
  EffectiveDTS  TIMESTAMP NOT NULL,
  LoadDTS       TIMESTAMP NOT NULL,
  RecordSource  VARCHAR(50) NOT NULL,
  PRIMARY KEY (Product_HK, EffectiveDTS)
);

CREATE TABLE Sat_OrderLine (
  OrderLine_HK  CHAR(32) NOT NULL,
  HashDiff      CHAR(32) NOT NULL,
  LineNumber    INT,
  Quantity      INT,
  UnitPrice     DECIMAL(18,2),
  DiscountPct   DECIMAL(5,2),
  Status        VARCHAR(30),
  EffectiveDTS  TIMESTAMP NOT NULL,
  LoadDTS       TIMESTAMP NOT NULL,
  RecordSource  VARCHAR(50) NOT NULL,
  PRIMARY KEY (OrderLine_HK, EffectiveDTS)
);
load pattern (ELT)
  • Hubs: insert new BKs (CustomerID, ProductID, OrderID) with *_HK = md5(BK).
  • Links: insert relationships using hub HKs + link HK (md5(OrderID||LineNumber)).
  • Satellites: compute HashDiff = md5(concat(attrs)); insert only when hash changes.
sat insert (change detection)
Plain Text
INSERT INTO Sat_Customer (Customer_HK, HashDiff, FullName, Email, Status, EffectiveDTS, LoadDTS, RecordSource)
SELECT
  md5(c.CustomerID),
  md5(coalesce(c.FirstName,'') || '|' || coalesce(c.LastName,'') || '|' || coalesce(c.Email,'') || '|' || coalesce(c.Status,'')),
  c.FirstName || ' ' || c.LastName,
  c.Email,
  c.Status,
  c.UpdatedAt,
  CURRENT_TIMESTAMP,
  c.SourceSystem
FROM Stg_Customer c
LEFT JOIN (
  SELECT Customer_HK, HashDiff
  FROM Sat_Customer sc
  QUALIFY ROW_NUMBER() OVER (PARTITION BY Customer_HK ORDER BY EffectiveDTS DESC) = 1
) last_sc
ON last_sc.Customer_HK = md5(c.CustomerID)
WHERE last_sc.HashDiff IS DISTINCT FROM
      md5(coalesce(c.FirstName,'') || '|' || coalesce(c.LastName,'') || '|' || coalesce(c.Email,'') || '|' || coalesce(c.Status,''));
PIT/Bridge (for point-in-time joins & performance)
Plain Text
CREATE TABLE PIT_OrderLine AS
SELECT
  l.OrderLine_HK,
  l.Order_HK,
  l.Product_HK,
  l.Customer_HK,
  -- latest sat rows as of a calendar grain
  s_ol.EffectiveDTS as OL_EffDTS,
  s_c.EffectiveDTS  as C_EffDTS,
  s_p.EffectiveDTS  as P_EffDTS
FROM Link_OrderLine l
LEFT JOIN LATEST_SAT_OL s_ol ON s_ol.OrderLine_HK = l.OrderLine_HK
LEFT JOIN LATEST_SAT_CUSTOMER s_c ON s_c.Customer_HK = l.Customer_HK
LEFT JOIN LATEST_SAT_PRODUCT  s_p ON s_p.Product_HK  = l.Product_HK;
publish a Kimball-style star from the vault
Plain Text
CREATE VIEW DimCustomer AS
SELECT
  hc.Customer_HK   AS CustomerKey,
  sc.FullName,
  sc.Email,
  sc.Status
FROM Hub_Customer hc
JOIN (
  SELECT * FROM Sat_Customer QUALIFY ROW_NUMBER()
  OVER (PARTITION BY Customer_HK ORDER BY EffectiveDTS DESC)=1
) sc ON sc.Customer_HK = hc.Customer_HK;

CREATE VIEW DimProduct AS
SELECT
  hp.Product_HK AS ProductKey,
  sp.Name,
  sp.Category,
  sp.ListPrice
FROM Hub_Product hp
JOIN (
  SELECT * FROM Sat_Product QUALIFY ROW_NUMBER()
  OVER (PARTITION BY Product_HK ORDER BY EffectiveDTS DESC)=1
) sp ON sp.Product_HK = hp.Product_HK;

CREATE VIEW FactSales AS
SELECT
  to_char(s_ol.EffectiveDTS, 'YYYYMMDD')::INT AS DateKey,
  l.Customer_HK AS CustomerKey,
  l.Product_HK  AS ProductKey,
  l.OrderLine_HK,
  s_ol.Quantity,
  s_ol.UnitPrice,
  s_ol.DiscountPct,
  s_ol.Quantity * s_ol.UnitPrice * (1 - s_ol.DiscountPct) AS SalesAmount,
  s_ol.Status,
  s_ol.EffectiveDTS AS EffectiveDTS
FROM Link_OrderLine l
JOIN (
  SELECT * FROM Sat_OrderLine QUALIFY ROW_NUMBER()
  OVER (PARTITION BY OrderLine_HK ORDER BY EffectiveDTS DESC)=1
) s_ol ON s_ol.OrderLine_HK = l.OrderLine_HK;
pros: insert-only audit trail, agile to new sources, scales massively
watchouts: not query-optimized without PIT/bridges/marts; more objects to manage

🔁 Same BI questions, three implementations

Q1) Revenue by category & month

  • Kimball: simple group by on FactSales + DimDate + DimProduct (shown earlier).
  • Inmon: either build a view from EDW that aggregates lines or (recommended) use the published mart (same query as Kimball).
  • Data Vault: use published star or a vault reporting view over PIT/bridge (query mirrors Kimball once published).
Plain Text
-- works for Kimball, Inmon mart, or DV-published star
SELECT d.Year, d.Month, p.Category, SUM(f.SalesAmount) AS Revenue
FROM FactSales f
JOIN DimDate d ON f.DateKey = d.DateKey
JOIN DimProduct p ON f.ProductKey = p.ProductKey
GROUP BY d.Year, d.Month, p.Category
ORDER BY d.Year, d.Month, p.Category;

Q2) Top 10 customers by spend (last 90 days)

Plain Text
SELECT c.FullName, SUM(f.SalesAmount) AS Spend
FROM FactSales f
JOIN DimCustomer c ON f.CustomerKey = c.CustomerKey
JOIN DimDate d ON f.DateKey = d.DateKey
WHERE d.Date >= CURRENT_DATE - INTERVAL '90 day'
GROUP BY c.FullName
ORDER BY Spend DESC
FETCH FIRST 10 ROWS ONLY;

Q3) Order pipeline (open vs shipped)

  • In DV, get latest status from Sat_OrderLine or Sat_OrderHeader (via PIT).
  • In Kimball/Inmon marts, status is in Fact or a degenerate dimension.
Plain Text
SELECT Status, COUNT(*) AS Lines
FROM FactSales
GROUP BY Status;

🧱 dbt project structure (side-by-side)

Plain Text
models/
  staging/
    stg_customer.sql
    stg_product.sql
    stg_order_header.sql
    stg_order_line.sql

  -- Kimball mart
  kimball/
    dim_customer.sql   (SCD2)
    dim_product.sql    (SCD2)
    fact_sales.sql

  -- Inmon
  edw/
    edw_customer.sql
    edw_product.sql
    edw_order_header.sql
    edw_order_line.sql
  marts_from_edw/
    dim_customer.sql
    dim_product.sql
    fact_sales.sql

  -- Data Vault
  vault/
    hub_customer.sql
    hub_product.sql
    hub_order.sql
    link_orderline.sql
    sat_customer.sql
    sat_product.sql
    sat_orderline.sql
  vault_publish/
    pit_orderline.sql
    dim_customer.sql
    dim_product.sql
    fact_sales.sql
Example dbt model (vault hub):
Plain Text
-- models/vault/hub_customer.sql
{{ config(materialized='incremental', unique_key='Customer_HK') }}
SELECT
  md5(CustomerID)        AS Customer_HK,
  CustomerID             AS Customer_BK,
  CURRENT_TIMESTAMP      AS LoadDTS,
  SourceSystem           AS RecordSource
FROM {{ ref('stg_customer') }}
{% if is_incremental() %}
WHERE UpdatedAt > (SELECT COALESCE(MAX(LoadDTS),'1900-01-01') FROM {{ this }})
{% endif %}

which should you choose?

  • need fast BI with a few subject areas and clear KPIs? → Kimball.
  • need enterprise-wide consistency & governance across many systems? → Inmon (EDW first).
  • need agility, auditability, frequent source changes, and scale? → Data Vault backbone + published stars.
Modern stacks often combine them:
Plain Text
Raw/Lake → Data Vault (Silver) → Dimensional Marts (Gold) → BI/ML

if you want, i can zip this into a starter dbt repo (kimball + inmon + data-vault folders, with macros for hashing & SCD2) and include a small seed dataset so you can run models locally. just say the word and i’ll drop the files.

Recommendation

  • Choose Kimball if your org prioritizes speed, business focus, and agility.
  • Choose Inmon if you need a centralized, consistent, and governed EDW.
  • Choose Hybrid if you want both agility and governance, especially in cloud-native lakehouse setups.
Both methodologies remain cornerstones of data warehousing. The right choice depends on scale, governance needs, and time-to-value expectations.