Keep the system of record in PostgreSQL. Replicate the parts that need to answer fast aggregate questions into ClickHouse. You get transactional correctness on writes, sub-second queries over billions of rows on reads, and one schema you actually wrote — not two stacks pulling against each other.
Postgres and ClickHouse are good at different shapes of workload. The honest answer is to use both, with Postgres as the canonical source and ClickHouse as a derived analytical projection.
SERIALIZABLE
isolation, partial indexes, row-level security
— the things you actually need to model a
business — live where they belong. ClickHouse
deliberately doesn't do any of that.
MergeTree
family compress wide event tables by 10-30x and
scan billions of rows per second per node.
Dashboard queries that took 8 seconds on a
Postgres replica return in 80 ms.
GROUP BY over the last
90 days no longer evicts your hot index pages
from shared buffers. Production latency stops
correlating with whoever opened the BI tool.
ZSTD, LZ4) and
TTL-based tiering to object
storage.
PostgreSQL
table engine and a
MaterializedPostgreSQL database
engine that consumes Postgres logical
replication directly — no glue code required
for the common case.
Writes land in Postgres. A logical replication slot
(pgoutput) emits row-level changes;
ClickHouse's
MaterializedPostgreSQL engine — or a
standalone CDC pipeline (Debezium →
Kafka → Null table → MV)
— consumes the stream and merges it into a
ReplacingMergeTree. Reporting tools
query ClickHouse; the application keeps talking to
Postgres.
-- Source of truth: enforced constraints, normal Postgres modeling.
CREATE TABLE events (
id bigserial PRIMARY KEY,
user_id bigint NOT NULL REFERENCES users(id),
occurred_at timestamptz NOT NULL DEFAULT now(),
event_type text NOT NULL,
properties jsonb NOT NULL DEFAULT '{}'::jsonb,
revenue_cents int CHECK (revenue_cents >= 0)
);
-- Expose the table to logical replication
ALTER TABLE events REPLICA IDENTITY FULL;
CREATE PUBLICATION analytics_pub FOR TABLE events, users;
SELECT pg_create_logical_replication_slot('analytics_slot', 'pgoutput');
-- Columnar projection for fast OLAP. Sort key first, compression second.
CREATE TABLE events
(
id UInt64,
user_id UInt64,
occurred_at DateTime64(3, 'UTC'),
event_type LowCardinality(String),
properties String CODEC(ZSTD(3)),
revenue_cents Int32,
_version UInt64
)
ENGINE = ReplacingMergeTree(_version)
PARTITION BY toYYYYMM(occurred_at)
ORDER BY (event_type, user_id, occurred_at)
TTL occurred_at + INTERVAL 2 YEAR TO VOLUME 'cold';
-- Typical dashboard query: 4.2B rows scanned in ~120 ms on a single node.
SELECT toStartOfHour(occurred_at) AS hour,
event_type,
count() AS n,
sum(revenue_cents) / 100.0 AS revenue_usd
FROM events
WHERE occurred_at >= now() - INTERVAL 7 DAY
GROUP BY hour, event_type
ORDER BY hour;
-- Option A: native MaterializedPostgreSQL engine (lowest-friction)
CREATE DATABASE analytics
ENGINE = MaterializedPostgreSQL(
'primary.cluster.nyas.io:5432',
'app',
'replica_user',
'replica_pw'
)
SETTINGS materialized_postgresql_tables_list = 'events,users';
-- Option B: Debezium -> Kafka -> ClickHouse Kafka engine
CREATE TABLE events_kafka
(
payload String
)
ENGINE = Kafka
SETTINGS kafka_broker_list = 'kafka:9092',
kafka_topic_list = 'pg.public.events',
kafka_format = 'JSONAsString',
kafka_group_name = 'ch-events';
CREATE MATERIALIZED VIEW events_mv TO events AS
SELECT JSONExtractUInt(payload, 'after', 'id') AS id,
JSONExtractUInt(payload, 'after', 'user_id') AS user_id,
parseDateTime64BestEffort(JSONExtractString(payload, 'after', 'occurred_at')) AS occurred_at,
JSONExtractString(payload, 'after', 'event_type') AS event_type,
JSONExtractString(payload, 'after', 'properties') AS properties,
JSONExtractInt(payload, 'after', 'revenue_cents') AS revenue_cents,
JSONExtractUInt(payload, 'source', 'lsn') AS _version
FROM events_kafka;
windowFunnel and
retention functions express in one
line what becomes a multi-CTE nightmare in pure
SQL.
GROUP BY — per-user,
per-account, per-device rollups — that would
hash-aggregate Postgres into a swap storm,
returning interactively from the columnar side.
uniqHLL12,
quantileTDigest) for distinct
counts and percentiles at scale, with
deterministic state that can be merged across
shards.
postgresql() table function or the
PostgreSQL table engine, when an
analytical query needs a fresh dimension that
hasn't replicated yet.