ClickHouse PARTITION BY vs ORDER BY vs PRIMARY KEY

Q: What is the difference between PARTITION BY and ORDER BY in ClickHouse?

PARTITION BY decides how parts are grouped on disk for lifecycle operations — dropping old months, applying TTL, attaching backfills. ORDER BY decides how rows are physically sorted inside each part, which controls the primary index and most query performance. PARTITION BY is for operations; ORDER BY is for reads.

Q: What is the PRIMARY KEY in ClickHouse and how does it differ from a row store?

The ClickHouse PRIMARY KEY is a sparse index — it stores one entry per granule (8,192 rows by default), not one entry per row. It does not enforce uniqueness, and it has nothing to do with foreign keys. Its only job is to help the engine skip granules at read time. Treating it like a uniqueness constraint is the most common ClickHouse design mistake.

Q: Can ORDER BY and PRIMARY KEY be different in ClickHouse?

Yes, but the PRIMARY KEY must be a prefix of ORDER BY. The pattern is useful when you want a longer sort key for query layout but a shorter index key to keep the index small. If you do not specify PRIMARY KEY explicitly, ClickHouse uses the full ORDER BY as the primary key.

Q: How do I choose the right ORDER BY in ClickHouse?

Start with the columns most frequently used in WHERE predicates, ordered from low cardinality to high cardinality (e.g., (tenant_id, event_date, user_id)). The leading column should be the one you filter on most often. Avoid ordering by a high-cardinality column first — the index pays for itself when the leading column has lots of repeated values that prune well.

Q: Does PRIMARY KEY create a unique index in ClickHouse?

No. There is no enforced uniqueness in MergeTree. If you need deduplication, use ReplacingMergeTree with a version column or run an explicit dedup step before insert. Application-side uniqueness is the standard pattern.

Learn what PARTITION BY, ORDER BY, and PRIMARY KEY mean in ClickHouse, how they differ, and how to choose them for analytical workloads.

This post was written by an engineer at QueryPlane. QueryPlane is an app builder for your database: bring your own postgres db and you can create interactive applications to share with other developers, coworkers or even your customers. If you’re interested in trying it out, get started here.

If you come to ClickHouse from PostgreSQL, MySQL, or Snowflake, one of the first confusing things you run into is table design syntax. PARTITION BY, ORDER BY, and PRIMARY KEY all affect how data is laid out and queried, but not in the same way they do in row-store databases.

This is also one of the highest-leverage design decisions in ClickHouse. ClickHouse’s own optimization guidance says sort key design can improve performance by 100x or more.

This guide explains the practical difference between these clauses and how to choose them.

In this post, we’ll cover:

What PARTITION BY does
What ORDER BY does
What PRIMARY KEY does
Why ORDER BY usually matters most
How to choose each one

The short version

PARTITION BY controls how data is split into partitions
ORDER BY controls physical sort order within parts
PRIMARY KEY defines the sparse index used for pruning, and must be a prefix of ORDER BY if specified separately

If you remember only one thing, remember this:

In ClickHouse, ORDER BY is often the most important decision.

What PARTITION BY is for

PARTITION BY groups data into separate partitions. This is useful for:

lifecycle management
dropping old data efficiently
reducing the scope of some maintenance operations
aligning storage with major time windows or tenant boundaries

A common example is monthly partitioning on a timestamp:

PARTITION BY toYYYYMM(event_time)

This is often a good fit for large event tables because it makes retention policies and operational maintenance simpler.

What PARTITION BY is not:

it is not your main query-performance lever
it is not a replacement for sort-key design

Too many people overuse partitioning when the real performance problem is poor ORDER BY.

What ORDER BY is for

ORDER BY defines the physical sort order of the data on disk. This is the clause that most directly shapes query performance because ClickHouse uses that order to skip data efficiently.

The ClickHouse team repeatedly emphasizes this point: the right sort key is one of the biggest performance levers in the system.

Why?

Because analytical queries usually filter on a few dimensions over a huge table. If those dimensions line up with the sort order, ClickHouse can skip large amounts of data instead of scanning almost everything.

For example:

ORDER BY (event_date, customer_id, event_time)

This can be excellent if your workload often filters by date and then customer.

What PRIMARY KEY means in ClickHouse

This is where people coming from PostgreSQL get tripped up.

In ClickHouse, PRIMARY KEY does not usually mean:

unique constraint
transactional entity identity
relational integrity feature

Instead, it defines the sparse index used for pruning. In many tables, people do not specify it separately at all, and the ORDER BY key effectively acts as the primary key.

If you do specify PRIMARY KEY separately, it must be a prefix of ORDER BY.

That means a valid pattern looks like:

PRIMARY KEY (customer_id)
ORDER BY (customer_id, event_time)

This is useful when you want a narrower sparse index than the full sort key.

A practical example

CREATE TABLE page_views
(
  event_time DateTime,
  event_date Date,
  customer_id UInt64,
  region LowCardinality(String),
  url String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_date)
ORDER BY (customer_id, event_date, event_time);

How to read this:

data is partitioned monthly
within each part, rows are sorted by customer, then date, then time
pruning works well for queries filtered by customer_id and date ranges

This is a much more important performance design than simply partitioning by month and leaving the sort order weak.

Why ORDER BY usually matters most

ClickHouse’s optimization guidance is clear: proper ORDER BY selection has the single biggest impact on performance in many workloads.

That means:

choose partitioning for operational boundaries
choose ordering for query speed

This is the reverse of how many newcomers think about it. They obsess over partitions first because partitioning is the more familiar idea from other systems. In ClickHouse, the more important design conversation is usually about sort order.

See what QueryPlane can build for you

Connect to your database, write SQL with AI, and build shareable apps — all from your browser.

Get Started Book a Demo

How to choose PARTITION BY

Good partition keys are usually:

stable
coarse-grained
aligned with retention or operational maintenance

Common examples:

month
day for especially large time-series tables
tenant or region in carefully chosen cases

Bad partition keys are often:

too granular
too high-cardinality
chosen only because they appear in filters

Over-partitioning creates operational pain without necessarily improving query speed.

How to choose ORDER BY

Good ordering keys reflect common filter patterns.

A simple rule:

put frequently filtered columns in the sort key
often place lower-cardinality filter dimensions earlier
preserve enough uniqueness or detail later in the key to keep row order useful

Do not design the key around one rare query. Design it around the dominant workload.

When PRIMARY KEY should differ from ORDER BY

This is an advanced move, not the default.

Use a narrower PRIMARY KEY when:

a smaller sparse index is enough for pruning
the full sort key still needs extra trailing columns for layout reasons

But for many teams, starting with only ORDER BY is simpler and safer until you have a reason to optimize further.

Common mistakes

Overusing partitions

Partitioning is valuable, but it is not the main accelerator for most analytical queries.

Treating PRIMARY KEY like a uniqueness constraint

That is not the right mental model in ClickHouse.

Choosing ORDER BY for one edge case

Your main workloads should drive the sort key, not a single fancy query.

Copying row-store habits directly

ClickHouse rewards layout-first thinking much more than traditional transactional databases do.

A good decision process

When designing a MergeTree table:

List the most common WHERE filters.
Pick a coarse partition key for lifecycle and maintenance.
Pick an ORDER BY that helps the dominant filters.
Only then decide whether you need a separate PRIMARY KEY.

That order leads to much better schemas than starting from relational uniqueness concepts.

Frequently asked questions

What is the difference between PARTITION BY and ORDER BY in ClickHouse? PARTITION BY decides how parts are grouped on disk for lifecycle operations — dropping old months, applying TTL, attaching backfills. ORDER BY decides how rows are physically sorted inside each part, which controls the primary index and most query performance. PARTITION BY is for operations; ORDER BY is for reads.

Should I always use PARTITION BY in ClickHouse? No. Partitioning is a maintenance feature, not a query accelerator. Many teams default to PARTITION BY toYYYYMM(created_at) because data lifecycle aligns with calendar months, but if you do not need to drop or detach by partition, you can skip PARTITION BY entirely. Over-partitioning (one part per day or per hour) hurts more than it helps because part counts go up.

What is the PRIMARY KEY in ClickHouse and how does it differ from a row store? The ClickHouse PRIMARY KEY is a sparse index — it stores one entry per granule (8,192 rows by default), not one entry per row. It does not enforce uniqueness, and it has nothing to do with foreign keys. Its only job is to help the engine skip granules at read time. Treating it like a uniqueness constraint is the most common ClickHouse design mistake.

Can ORDER BY and PRIMARY KEY be different in ClickHouse? Yes, but the PRIMARY KEY must be a prefix of ORDER BY. The pattern is useful when you want a longer sort key for query layout but a shorter index key to keep the index small. If you do not specify PRIMARY KEY explicitly, ClickHouse uses the full ORDER BY as the primary key.

How do I choose the right ORDER BY in ClickHouse? Start with the columns most frequently used in WHERE predicates, ordered from low cardinality to high cardinality (e.g., (tenant_id, event_date, user_id)). The leading column should be the one you filter on most often. Avoid ordering by a high-cardinality column first — the index pays for itself when the leading column has lots of repeated values that prune well.

How many partitions should a ClickHouse table have? Hundreds at most, not thousands. Each partition is a separate set of parts on disk, and the engine works harder to merge and read across many partitions. Monthly partitioning is a safe default for time-series tables; daily partitioning is only worth it if you genuinely drop by day.

Does PRIMARY KEY create a unique index in ClickHouse? No. There is no enforced uniqueness in MergeTree. If you need deduplication, use ReplacingMergeTree with a version column or run an explicit dedup step before insert. Application-side uniqueness is the standard pattern.

Why is my ClickHouse query slow even though it has the right WHERE clause? Almost always because the sort key does not match the predicate, the predicate is on a column not in ORDER BY, or the leading column of the sort key is too high-cardinality to prune effectively. Run EXPLAIN indexes = 1 on the query to see how many granules ClickHouse expects to scan after pruning — if that number is close to the total, the index is not earning its keep.

Wrapping up

PARTITION BY, ORDER BY, and PRIMARY KEY all matter in ClickHouse, but they solve different problems:

PARTITION BY is for operational grouping
ORDER BY is for physical layout and query speed
PRIMARY KEY is for sparse index pruning

If you internalize that, a lot of ClickHouse design decisions become easier. And if you are using engines like ReplacingMergeTree, the importance of ORDER BY becomes even higher because it also affects replacement semantics.

Pair this guide with our ReplacingMergeTree guide, projections guide, and TTL and data skipping indexes guide if you are designing production tables. And for the day-to-day workflow side, see our guide to the best ClickHouse GUI tools.