An Intro to ClickHouse ReplacingMergeTree
Learn when to use ReplacingMergeTree in ClickHouse, how deduplication actually works, why FINAL matters, and the common schema design mistakes.
ClickHouse
This post was written by an engineer at QueryPlane. QueryPlane is an app builder for your database: bring your own postgres db and you can create interactive applications to share with other developers, coworkers or even your customers. If you’re interested in trying it out, get started here.
ReplacingMergeTree is one of the first ClickHouse features people encounter when they realize ClickHouse does not handle row-by-row updates like a traditional OLTP database. Instead of updating rows in place, a common ClickHouse pattern is to insert newer versions of a row and let background merges keep the latest one.
That sounds simple, but it creates a lot of confusion. Teams expect immediate deduplication, misunderstand what key is used to decide duplicates, or reach for FINAL everywhere and accidentally make queries expensive.
This guide covers the practical model.
In this post, we’ll cover:
- What ReplacingMergeTree is
- How deduplication actually works
- How to choose the right ORDER BY key
- When to use FINAL
- Common mistakes
What ReplacingMergeTree does
ReplacingMergeTree is a MergeTree-family engine designed for “replace old version with new version” workloads. Instead of mutating existing rows in place, you insert a new version of the row. During background merges, ClickHouse keeps the latest version and discards older duplicates according to the deduplication key.
That makes it useful for:
- CDC ingestion
- last-write-wins entity state
- upsert-like analytical workflows
- event streams where newer rows supersede older ones
It is not a magic transactional update system. It is an analytical storage pattern that makes update-like behavior practical in ClickHouse.
The most important concept: deduplication uses ORDER BY
This is the part many teams miss.
In ReplacingMergeTree, deduplication is based on the sorting key defined in ORDER BY, not on some hidden relational primary key concept. ClickHouse’s own data modeling content makes this explicit: the ordering key determines how duplicates are identified for ReplacingMergeTree.
That means your schema design matters a lot. If your ORDER BY does not uniquely identify the logical row you are trying to replace, ReplacingMergeTree will not behave the way you expect.
A simple example
CREATE TABLE orders
(
order_id UInt64,
customer_id UInt64,
status String,
version UInt64,
updated_at DateTime
)
ENGINE = ReplacingMergeTree(version)
ORDER BY order_id;
Then write a newer version of a row by inserting again:
INSERT INTO orders VALUES
(1001, 42, 'pending', 1, now());
INSERT INTO orders VALUES
(1001, 42, 'paid', 2, now());
Eventually, merges keep the later version for order_id = 1001.
That “eventually” is important.
Deduplication is not always immediate
ReplacingMergeTree relies on background merges. So if you query the table right after inserts, you may still see multiple versions of the same row until merges run.
This is the source of a lot of “ReplacingMergeTree is broken” confusion. The engine is working as designed. The table just has not merged away older versions yet.
If you need query-time deduplication immediately, you can use FINAL:
SELECT *
FROM orders FINAL
WHERE order_id = 1001;
But FINAL should be used carefully because it can make queries more expensive. It is a correctness tool, not something you want to sprinkle over every large analytical query by default.
Choosing the right ORDER BY key
The best design rule is:
- put the columns that identify the logical row in the deduplication key
For a user profile table, that may be user_id.
For an order state table, that may be order_id.
For a composite entity, it may be more than one column.
What teams often do wrong is optimize only for query filtering and forget that ReplacingMergeTree also uses the sorting key for deduplication. You need a key that works for both storage efficiency and replacement semantics.
If you need broader query optimization on columns outside the main ordering key, look at ClickHouse projections instead of forcing the wrong deduplication key.
Should you add a version column?
Usually, yes.
A version column makes it explicit which row is newer. Without it, ClickHouse may keep the last inserted row by part order, which is usually not the robustness you want in serious pipelines.
A common pattern is:
ENGINE = ReplacingMergeTree(version)
where version is a monotonic sequence, event timestamp, or CDC version value.
If your source system already provides LSNs, sequence numbers, or monotonically increasing update versions, use them.
When ReplacingMergeTree is a good fit
It works well for:
- CDC replication from transactional systems
- latest-state analytical tables
- dimensions that receive updates over time
- workloads where inserts are natural and row replacement is eventual
It is less attractive when:
- you need strict immediate row-level update semantics
- duplicate visibility between merges is unacceptable
- the entity key is unclear or unstable
In those cases, reconsider the modeling pattern or use a different ClickHouse engine/workflow.
See what QueryPlane can build for you
Connect to your database, write SQL with AI, and build shareable apps — all from your browser.
Common mistakes
Using the wrong ORDER BY key
This is the biggest one. If ORDER BY does not match the logical uniqueness of the row, deduplication will not reflect your intent.
Expecting merges to behave like transactions
ReplacingMergeTree is eventual with respect to merge-driven deduplication. It is not OLTP semantics.
Using FINAL everywhere
FINAL is useful, but it is not free. Overusing it can erase the performance advantage you were hoping to get from ClickHouse in the first place.
Forgetting that query design still matters
ReplacingMergeTree solves one class of modeling problem. It does not remove the need to think carefully about PARTITION BY, ORDER BY, and the shape of downstream queries.
A good decision rule
Use ReplacingMergeTree when you can honestly say:
- Newer rows supersede older rows.
- Eventual merge-driven replacement is acceptable.
- You can define a stable deduplication key in
ORDER BY.
If any of those is false, step back before defaulting to the engine.
Frequently asked questions
What is ReplacingMergeTree in ClickHouse?
ReplacingMergeTree is a table engine that deduplicates rows in the background as parts merge, keeping the row with the highest version per ORDER BY key. It is the standard way to model upsert-like behavior in an analytical store that does not natively support row-level updates.
Is ReplacingMergeTree the same as an OLTP upsert?
No. The “replace” happens during background merges, not at write time. A query that runs before the next merge will still see both the old and new row unless you explicitly tell it to look only at the latest version (via FINAL, an aggregating query, or argMax). It is upsert-by-eventual-consistency, not transactional upsert.
How do I get the latest row in a ReplacingMergeTree?
Either run the query with FINAL (correct but heavier), or write argMax(<col>, <version>) GROUP BY <key> so you collapse to the latest row inline, or use the ORDER BY <key>, <version> DESC LIMIT 1 BY <key> pattern. The right choice depends on how many keys you are reading and whether the cost of FINAL is acceptable.
What is the ver column in ReplacingMergeTree?
ver (the version column you pass to the engine: ReplacingMergeTree(ver)) is the tiebreaker between duplicate rows with the same ORDER BY key — the row with the largest ver wins. If you omit it, ClickHouse keeps whichever row is encountered last during the merge, which is non-deterministic. Always specify a version column in production tables.
Does ReplacingMergeTree support deletes?
Yes, through the is_deleted parameter (ReplacingMergeTree(ver, is_deleted)). Rows where is_deleted = 1 are dropped during the next merge instead of being kept as the “latest” version. Combine this with FINAL or argMax to query a logically-deleted dataset correctly.
Why are my duplicate rows still visible after an INSERT?
Because deduplication happens during merges, not at INSERT time. Newly inserted rows live in their own parts until ClickHouse merges them together, which can take seconds to many minutes depending on part count and merge pressure. If you cannot wait, query with FINAL or rewrite the query to do its own per-key collapse.
Is OPTIMIZE TABLE ... FINAL safe to run in production?
It works, but it is expensive and not a replacement for letting background merges run on their own. Each call rewrites entire parts and can starve foreground queries on a busy cluster. Reach for it for one-off remediation (after a backfill, or before publishing to a downstream system), not as a routine job.
Wrapping up
ReplacingMergeTree is one of the most useful ClickHouse engines because it gives you a practical way to model upsert-like analytical data without pretending ClickHouse is an OLTP database. The trick is understanding what it actually does:
- deduplication is based on
ORDER BY - merges are eventual
FINALis a correctness escape hatch, not the default
If you are designing these tables from scratch, pair this with our guide to ClickHouse PARTITION BY, ORDER BY, and PRIMARY KEY. And if you want the day-to-day tooling side, see our guide to the best ClickHouse GUI tools.