ClickHouse PARTITION BY vs ORDER BY vs PRIMARY KEY
Learn what PARTITION BY, ORDER BY, and PRIMARY KEY mean in ClickHouse, how they differ, and how to choose them for analytical workloads.
ClickHouse
This post was written by an engineer at QueryPlane. QueryPlane is an app builder for your database: bring your own postgres db and you can create interactive applications to share with other developers, coworkers or even your customers. If you’re interested in trying it out, get started here.
If you come to ClickHouse from PostgreSQL, MySQL, or Snowflake, one of the first confusing things you run into is table design syntax. PARTITION BY, ORDER BY, and PRIMARY KEY all affect how data is laid out and queried, but not in the same way they do in row-store databases.
This is also one of the highest-leverage design decisions in ClickHouse. ClickHouse’s own optimization guidance says sort key design can improve performance by 100x or more.
This guide explains the practical difference between these clauses and how to choose them.
In this post, we’ll cover:
- What PARTITION BY does
- What ORDER BY does
- What PRIMARY KEY does
- Why ORDER BY usually matters most
- How to choose each one
The short version
PARTITION BYcontrols how data is split into partitionsORDER BYcontrols physical sort order within partsPRIMARY KEYdefines the sparse index used for pruning, and must be a prefix ofORDER BYif specified separately
If you remember only one thing, remember this:
In ClickHouse, ORDER BY is often the most important decision.
What PARTITION BY is for
PARTITION BY groups data into separate partitions. This is useful for:
- lifecycle management
- dropping old data efficiently
- reducing the scope of some maintenance operations
- aligning storage with major time windows or tenant boundaries
A common example is monthly partitioning on a timestamp:
PARTITION BY toYYYYMM(event_time)
This is often a good fit for large event tables because it makes retention policies and operational maintenance simpler.
What PARTITION BY is not:
- it is not your main query-performance lever
- it is not a replacement for sort-key design
Too many people overuse partitioning when the real performance problem is poor ORDER BY.
What ORDER BY is for
ORDER BY defines the physical sort order of the data on disk. This is the clause that most directly shapes query performance because ClickHouse uses that order to skip data efficiently.
The ClickHouse team repeatedly emphasizes this point: the right sort key is one of the biggest performance levers in the system.
Why?
Because analytical queries usually filter on a few dimensions over a huge table. If those dimensions line up with the sort order, ClickHouse can skip large amounts of data instead of scanning almost everything.
For example:
ORDER BY (event_date, customer_id, event_time)
This can be excellent if your workload often filters by date and then customer.
What PRIMARY KEY means in ClickHouse
This is where people coming from PostgreSQL get tripped up.
In ClickHouse, PRIMARY KEY does not usually mean:
- unique constraint
- transactional entity identity
- relational integrity feature
Instead, it defines the sparse index used for pruning. In many tables, people do not specify it separately at all, and the ORDER BY key effectively acts as the primary key.
If you do specify PRIMARY KEY separately, it must be a prefix of ORDER BY.
That means a valid pattern looks like:
PRIMARY KEY (customer_id)
ORDER BY (customer_id, event_time)
This is useful when you want a narrower sparse index than the full sort key.
A practical example
CREATE TABLE page_views
(
event_time DateTime,
event_date Date,
customer_id UInt64,
region LowCardinality(String),
url String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_date)
ORDER BY (customer_id, event_date, event_time);
How to read this:
- data is partitioned monthly
- within each part, rows are sorted by customer, then date, then time
- pruning works well for queries filtered by
customer_idand date ranges
This is a much more important performance design than simply partitioning by month and leaving the sort order weak.
Why ORDER BY usually matters most
ClickHouse’s optimization guidance is clear: proper ORDER BY selection has the single biggest impact on performance in many workloads.
That means:
- choose partitioning for operational boundaries
- choose ordering for query speed
This is the reverse of how many newcomers think about it. They obsess over partitions first because partitioning is the more familiar idea from other systems. In ClickHouse, the more important design conversation is usually about sort order.
See what QueryPlane can build for you
Connect to your database, write SQL with AI, and build shareable apps — all from your browser.
How to choose PARTITION BY
Good partition keys are usually:
- stable
- coarse-grained
- aligned with retention or operational maintenance
Common examples:
- month
- day for especially large time-series tables
- tenant or region in carefully chosen cases
Bad partition keys are often:
- too granular
- too high-cardinality
- chosen only because they appear in filters
Over-partitioning creates operational pain without necessarily improving query speed.
How to choose ORDER BY
Good ordering keys reflect common filter patterns.
A simple rule:
- put frequently filtered columns in the sort key
- often place lower-cardinality filter dimensions earlier
- preserve enough uniqueness or detail later in the key to keep row order useful
Do not design the key around one rare query. Design it around the dominant workload.
When PRIMARY KEY should differ from ORDER BY
This is an advanced move, not the default.
Use a narrower PRIMARY KEY when:
- a smaller sparse index is enough for pruning
- the full sort key still needs extra trailing columns for layout reasons
But for many teams, starting with only ORDER BY is simpler and safer until you have a reason to optimize further.
Common mistakes
Overusing partitions
Partitioning is valuable, but it is not the main accelerator for most analytical queries.
Treating PRIMARY KEY like a uniqueness constraint
That is not the right mental model in ClickHouse.
Choosing ORDER BY for one edge case
Your main workloads should drive the sort key, not a single fancy query.
Copying row-store habits directly
ClickHouse rewards layout-first thinking much more than traditional transactional databases do.
A good decision process
When designing a MergeTree table:
- List the most common
WHEREfilters. - Pick a coarse partition key for lifecycle and maintenance.
- Pick an
ORDER BYthat helps the dominant filters. - Only then decide whether you need a separate
PRIMARY KEY.
That order leads to much better schemas than starting from relational uniqueness concepts.
Frequently asked questions
What is the difference between PARTITION BY and ORDER BY in ClickHouse?
PARTITION BY decides how parts are grouped on disk for lifecycle operations — dropping old months, applying TTL, attaching backfills. ORDER BY decides how rows are physically sorted inside each part, which controls the primary index and most query performance. PARTITION BY is for operations; ORDER BY is for reads.
Should I always use PARTITION BY in ClickHouse?
No. Partitioning is a maintenance feature, not a query accelerator. Many teams default to PARTITION BY toYYYYMM(created_at) because data lifecycle aligns with calendar months, but if you do not need to drop or detach by partition, you can skip PARTITION BY entirely. Over-partitioning (one part per day or per hour) hurts more than it helps because part counts go up.
What is the PRIMARY KEY in ClickHouse and how does it differ from a row store?
The ClickHouse PRIMARY KEY is a sparse index — it stores one entry per granule (8,192 rows by default), not one entry per row. It does not enforce uniqueness, and it has nothing to do with foreign keys. Its only job is to help the engine skip granules at read time. Treating it like a uniqueness constraint is the most common ClickHouse design mistake.
Can ORDER BY and PRIMARY KEY be different in ClickHouse?
Yes, but the PRIMARY KEY must be a prefix of ORDER BY. The pattern is useful when you want a longer sort key for query layout but a shorter index key to keep the index small. If you do not specify PRIMARY KEY explicitly, ClickHouse uses the full ORDER BY as the primary key.
How do I choose the right ORDER BY in ClickHouse?
Start with the columns most frequently used in WHERE predicates, ordered from low cardinality to high cardinality (e.g., (tenant_id, event_date, user_id)). The leading column should be the one you filter on most often. Avoid ordering by a high-cardinality column first — the index pays for itself when the leading column has lots of repeated values that prune well.
How many partitions should a ClickHouse table have? Hundreds at most, not thousands. Each partition is a separate set of parts on disk, and the engine works harder to merge and read across many partitions. Monthly partitioning is a safe default for time-series tables; daily partitioning is only worth it if you genuinely drop by day.
Does PRIMARY KEY create a unique index in ClickHouse?
No. There is no enforced uniqueness in MergeTree. If you need deduplication, use ReplacingMergeTree with a version column or run an explicit dedup step before insert. Application-side uniqueness is the standard pattern.
Why is my ClickHouse query slow even though it has the right WHERE clause?
Almost always because the sort key does not match the predicate, the predicate is on a column not in ORDER BY, or the leading column of the sort key is too high-cardinality to prune effectively. Run EXPLAIN indexes = 1 on the query to see how many granules ClickHouse expects to scan after pruning — if that number is close to the total, the index is not earning its keep.
Wrapping up
PARTITION BY, ORDER BY, and PRIMARY KEY all matter in ClickHouse, but they solve different problems:
PARTITION BYis for operational groupingORDER BYis for physical layout and query speedPRIMARY KEYis for sparse index pruning
If you internalize that, a lot of ClickHouse design decisions become easier. And if you are using engines like ReplacingMergeTree, the importance of ORDER BY becomes even higher because it also affects replacement semantics.
Pair this guide with our ReplacingMergeTree guide, projections guide, and TTL and data skipping indexes guide if you are designing production tables. And for the day-to-day workflow side, see our guide to the best ClickHouse GUI tools.