Analytics with Postgres
OLTP truth, OLAP speed

Keep the system of record in PostgreSQL. Replicate the parts that need to answer fast aggregate questions into ClickHouse. You get transactional correctness on writes, sub-second queries over billions of rows on reads, and one schema you actually wrote — not two stacks pulling against each other.

Why pair Postgres with ClickHouse

Postgres and ClickHouse are good at different shapes of workload. The honest answer is to use both, with Postgres as the canonical source and ClickHouse as a derived analytical projection.

  • Transactional integrity stays in Postgres. Foreign keys, unique constraints, SERIALIZABLE isolation, partial indexes, row-level security — the things you actually need to model a business — live where they belong. ClickHouse deliberately doesn't do any of that.
  • Aggregate queries get a 10-1000x speedup. ClickHouse's columnar storage, vectorized execution, and MergeTree family compress wide event tables by 10-30x and scan billions of rows per second per node. Dashboard queries that took 8 seconds on a Postgres replica return in 80 ms.
  • The OLTP path stays unburdened. Heavy GROUP BY over the last 90 days no longer evicts your hot index pages from shared buffers. Production latency stops correlating with whoever opened the BI tool.
  • One developer experience. Engineers write the schema once, in Postgres DDL. The analytics warehouse is a generated downstream — schemas are mirrored, not invented in a separate modelling tool.
  • Reversible. Because Postgres is the source of truth, ClickHouse can be rebuilt from scratch at any time by replaying logical replication. No two-way drift.
  • Cost-efficient retention. Cold partitions roll off Postgres after 30-90 days; ClickHouse keeps multi-year history at a fraction of the storage cost thanks to column-level compression (ZSTD, LZ4) and TTL-based tiering to object storage.
  • Native interop. ClickHouse ships a first-class PostgreSQL table engine and a MaterializedPostgreSQL database engine that consumes Postgres logical replication directly — no glue code required for the common case.

Reference architecture

Writes land in Postgres. A logical replication slot (pgoutput) emits row-level changes; ClickHouse's MaterializedPostgreSQL engine — or a standalone CDC pipeline (Debezium → KafkaNull table → MV) — consumes the stream and merges it into a ReplacingMergeTree. Reporting tools query ClickHouse; the application keeps talking to Postgres.

-- Source of truth: enforced constraints, normal Postgres modeling.
CREATE TABLE events (
    id            bigserial PRIMARY KEY,
    user_id       bigint      NOT NULL REFERENCES users(id),
    occurred_at   timestamptz NOT NULL DEFAULT now(),
    event_type    text        NOT NULL,
    properties    jsonb       NOT NULL DEFAULT '{}'::jsonb,
    revenue_cents int         CHECK (revenue_cents >= 0)
);

-- Expose the table to logical replication
ALTER TABLE events REPLICA IDENTITY FULL;

CREATE PUBLICATION analytics_pub FOR TABLE events, users;
SELECT pg_create_logical_replication_slot('analytics_slot', 'pgoutput');
-- Columnar projection for fast OLAP. Sort key first, compression second.
CREATE TABLE events
(
    id            UInt64,
    user_id       UInt64,
    occurred_at   DateTime64(3, 'UTC'),
    event_type    LowCardinality(String),
    properties    String CODEC(ZSTD(3)),
    revenue_cents Int32,
    _version      UInt64
)
ENGINE = ReplacingMergeTree(_version)
PARTITION BY toYYYYMM(occurred_at)
ORDER BY (event_type, user_id, occurred_at)
TTL occurred_at + INTERVAL 2 YEAR TO VOLUME 'cold';

-- Typical dashboard query: 4.2B rows scanned in ~120 ms on a single node.
SELECT toStartOfHour(occurred_at) AS hour,
       event_type,
       count()                    AS n,
       sum(revenue_cents) / 100.0 AS revenue_usd
FROM events
WHERE occurred_at >= now() - INTERVAL 7 DAY
GROUP BY hour, event_type
ORDER BY hour;
-- Option A: native MaterializedPostgreSQL engine (lowest-friction)
CREATE DATABASE analytics
ENGINE = MaterializedPostgreSQL(
    'primary.cluster.nyas.io:5432',
    'app',
    'replica_user',
    'replica_pw'
)
SETTINGS materialized_postgresql_tables_list = 'events,users';

-- Option B: Debezium -> Kafka -> ClickHouse Kafka engine
CREATE TABLE events_kafka
(
    payload String
)
ENGINE = Kafka
SETTINGS kafka_broker_list = 'kafka:9092',
         kafka_topic_list  = 'pg.public.events',
         kafka_format      = 'JSONAsString',
         kafka_group_name  = 'ch-events';

CREATE MATERIALIZED VIEW events_mv TO events AS
SELECT JSONExtractUInt(payload, 'after', 'id')                       AS id,
       JSONExtractUInt(payload, 'after', 'user_id')                  AS user_id,
       parseDateTime64BestEffort(JSONExtractString(payload, 'after', 'occurred_at')) AS occurred_at,
       JSONExtractString(payload, 'after', 'event_type')             AS event_type,
       JSONExtractString(payload, 'after', 'properties')             AS properties,
       JSONExtractInt(payload, 'after', 'revenue_cents')             AS revenue_cents,
       JSONExtractUInt(payload, 'source', 'lsn')                     AS _version
FROM events_kafka;

Query patterns that pay off

  • Funnels and retention over hundreds of millions of events. ClickHouse's windowFunnel and retention functions express in one line what becomes a multi-CTE nightmare in pure SQL.
  • High-cardinality GROUP BY — per-user, per-account, per-device rollups — that would hash-aggregate Postgres into a swap storm, returning interactively from the columnar side.
  • Approximate analytics (uniqHLL12, quantileTDigest) for distinct counts and percentiles at scale, with deterministic state that can be merged across shards.
  • Joins back to Postgres on the hot edge via the postgresql() table function or the PostgreSQL table engine, when an analytical query needs a fresh dimension that hasn't replicated yet.

When to use this stack

  • Reporting queries are starting to pull production Postgres latency around with them.
  • You have at least one event-shaped table (clickstream, audit log, telemetry, orders) on track to cross 100M rows.
  • You want a single team owning both stores with one schema definition, not a separate data platform org and a quarterly modelling debate.
  • You'd rather scale the cheaper, simpler half of the system (ClickHouse for scans) than overpay for read replicas of an OLTP engine doing work it wasn't designed for.