An Intro to ClickHouse LowCardinality
Learn when to use LowCardinality in ClickHouse, why it speeds up queries, and where teams overuse it or apply it to the wrong columns.
ClickHouse
This post was written by an engineer at QueryPlane. QueryPlane is an app builder for your database: bring your own postgres db and you can create interactive applications to share with other developers, coworkers or even your customers. If you’re interested in trying it out, get started here.
LowCardinality is one of the most useful ClickHouse data type wrappers because it improves both storage efficiency and query speed for the right kind of column. It is also one of the easiest features to overuse once you see it in examples.
The short version is simple: LowCardinality(String) is often a great fit for repeated string values like country codes, event types, status labels, and environment names. The longer version is where the real engineering judgment lives.
This guide covers when to use it and when not to. I also verified the example below locally with clickhouse local on ClickHouse 26.4.1.1039 so we are not hand-waving the syntax.
In this post, we’ll cover:
- What LowCardinality does
- Why it can improve performance
- Good column candidates
- Bad column candidates
- How it relates to wider schema design
What LowCardinality means
LowCardinality(T) is a wrapper type in ClickHouse that uses dictionary encoding for repeated values. Instead of storing the full value repeatedly, ClickHouse stores a dictionary of distinct values and lightweight references to those values.
That is why it is often effective for strings with a relatively limited set of repeated values.
Examples:
country_codeevent_typestatusregionenvironment
This can reduce storage and also speed up operations like grouping and filtering because ClickHouse can work efficiently with encoded values.
Why it helps
There are two main wins:
- less repeated storage
- faster operations on repeated categorical values
ClickHouse’s own query optimization material calls out LowCardinality as an important performance tool because data types affect both size and execution speed. In practice, that means the right type choice is not cosmetic. It changes how much data ClickHouse has to move and compare.
If a string column has many repeated values, LowCardinality(String) is often one of the highest-leverage low-effort schema improvements you can make.
A simple example
CREATE TABLE events
(
event_time DateTime,
environment LowCardinality(String),
event_type LowCardinality(String),
country_code LowCardinality(String),
user_id UInt64
)
ENGINE = MergeTree
ORDER BY (event_time, user_id);
The table definition works as written. I loaded a few rows and grouped by the encoded columns:
INSERT INTO events VALUES
('2026-04-01 10:00:00', 'prod', 'page_view', 'US', 101),
('2026-04-01 10:01:00', 'prod', 'purchase', 'US', 101),
('2026-04-01 10:02:00', 'staging', 'page_view', 'DE', 102),
('2026-04-01 10:03:00', 'prod', 'page_view', 'US', 103),
('2026-04-01 10:04:00', 'dev', 'signup', 'IN', 104),
('2026-04-01 10:05:00', 'prod', 'purchase', 'DE', 105);
SELECT
environment,
event_type,
count() AS event_count
FROM events
GROUP BY environment, event_type
ORDER BY environment, event_type;
Local result:
dev signup 1
prod page_view 2
prod purchase 2
staging page_view 1
And DESCRIBE TABLE events confirms that ClickHouse keeps the columns as LowCardinality(String):
event_time DateTime
environment LowCardinality(String)
event_type LowCardinality(String)
country_code LowCardinality(String)
user_id UInt64
This is a good use of LowCardinality because:
environmentusually has very few valuesevent_typeis often repetitivecountry_codehas a bounded set of repeated values
These are exactly the kinds of columns you group by, filter on, and display frequently in analytical workloads.
Good candidates for LowCardinality
Use it on columns that are:
- repeated frequently
- categorical rather than unique
- commonly filtered or grouped
- usually strings or similar symbolic values
Strong examples:
statusregiondevice_typeplan_nametenant_tierlog_level
These are the columns that appear over and over in observability, analytics, and event pipelines.
Bad candidates for LowCardinality
Do not reach for it blindly.
Weak candidates include:
- very high-cardinality identifiers
- columns where most values are unique
- long free-form text
- columns that are naturally numeric and already compact
For example, wrapping email or session_id in LowCardinality(String) is usually not the win people hope for if those values are near-unique.
The name of the type is the hint. It is meant for low-cardinality columns.
A practical rule of thumb
Ask this question:
“Will this column repeat enough that encoding the distinct values is a meaningful win?”
If the answer is yes, LowCardinality is worth considering.
If the answer is “mostly every row is different,” it probably is not.
LowCardinality and schema readability
One underrated benefit of using LowCardinality deliberately is that it makes your schema communicate intent. When someone sees:
status LowCardinality(String)
they immediately learn something about the shape of the data. This is a categorical field with a bounded repeated domain.
That helps future schema maintenance and query design. Good schemas are not just fast. They are legible.
See what QueryPlane can build for you
Connect to your database, write SQL with AI, and build shareable apps — all from your browser.
Common mistakes
Wrapping every string column
This is the classic anti-pattern. LowCardinality is not a default for all text.
Ignoring query patterns
Some columns are low-cardinality but barely used. Others are low-cardinality and constantly grouped, filtered, and displayed. The second group benefits much more.
Forgetting that sort order still matters more
LowCardinality helps, but it does not replace good table design. ORDER BY and PRIMARY KEY choices still have a much larger impact on many query workloads. If that part is wrong, wrapping types alone will not save you.
Treating it as a substitute for normalization decisions
It is a storage and performance tool, not a modeling philosophy.
LowCardinality vs Enum
This question comes up often. In general:
Enumis useful when the set of values is tightly controlled and stableLowCardinality(String)is more flexible when values may grow or change over time
For evolving analytics schemas, LowCardinality(String) is usually the safer and more maintainable default unless you have a strong reason to lock the value set down.
When it is especially worth it
LowCardinality often pays off quickly in:
- observability/event tables
- dimensional attributes with repeated labels
- analytics marts with lots of grouping on categorical fields
- schemas migrated from row-store systems that defaulted too many fields to plain
String
Frequently asked questions
What is LowCardinality in ClickHouse?
LowCardinality(T) is a column modifier that stores a dictionary of distinct values and replaces each row’s value with an integer index into that dictionary. The dictionary is held per-part and reused across the column, so when most rows share a few hundred or few thousand distinct values, the on-disk size collapses and equality/IN operations run against the dictionary instead of the raw bytes. The wrapper is transparent — SELECT and WHERE work as if the column were a plain String.
When should I use LowCardinality(String) vs plain String in ClickHouse?
Use LowCardinality(String) for columns with up to roughly 100,000 distinct values, especially when those values repeat heavily across rows. Typical fits include status fields, country codes, event names, log levels, environment labels, and customer plan tiers. Plain String is the right choice when the column is effectively unique per row — IDs, UUIDs, free-form text, raw URLs — because in those cases the dictionary holds nearly as many entries as there are rows and only adds overhead.
How does LowCardinality compare to ClickHouse’s Enum type?
Enum fixes the set of valid values at table-create time and stores each value as a small integer code. It is faster than LowCardinality for tight, stable sets (HTTP methods, boolean-like states) and rejects writes for unknown values, which adds a useful schema constraint. LowCardinality(String) is more flexible because new values can appear at any time without an ALTER TABLE, which usually matters more for evolving analytics schemas than the marginal speed difference.
Does LowCardinality affect sort key or primary key performance?
The sort key and primary key still dominate scan cost. LowCardinality reduces per-value work and helps GROUP BY and equality filters, but it does not change how ClickHouse skips granules. If your queries always filter by tenant_id, putting tenant_id in ORDER BY matters far more than wrapping it in LowCardinality. The two are complementary — LowCardinality makes the columns the engine touches cheaper; the sort key reduces how many it has to touch in the first place.
Can I add LowCardinality to an existing column?
Yes — ALTER TABLE events MODIFY COLUMN status LowCardinality(String) rewrites the column in place via background mutation. The operation reads the entire column and writes it back, so it is expensive on a large table; check system.mutations to monitor progress. For large tables the safer pattern is to create a new column, backfill with INSERT SELECT, swap names, and drop the old column.
Does LowCardinality work with Nullable?
Yes. LowCardinality(Nullable(String)) is supported, but the nullable wrapper costs an extra bit per row and is rarely needed for the kinds of categorical columns LowCardinality targets. The conventional advice is to use a sentinel like the empty string instead of nullable when the column is naturally never absent — both the storage and the query plans get simpler.
Is there a cardinality threshold above which LowCardinality hurts performance?
ClickHouse’s documentation suggests up to roughly 100,000 distinct values as the rule of thumb, but the practical answer is: measure compressed size and query latency. Above the threshold, the dictionary becomes large enough that the indirection overhead outweighs the storage savings. The fastest way to check is SELECT formatReadableSize(sum(data_compressed_bytes)) from system.columns with and without the wrapper on a representative dataset.
Wrapping up
LowCardinality is a great example of what makes ClickHouse powerful: seemingly small schema decisions can have outsized impact on analytical performance. Used on the right columns, it improves storage efficiency and speeds up common query patterns. Used indiscriminately, it just adds noise.
The best approach is simple:
- apply it to repeated categorical values
- skip it for mostly unique text
- remember that sort key design still matters more
If you are tuning table layouts more broadly, pair this with our guide to ClickHouse PARTITION BY, ORDER BY, and PRIMARY KEY and our TTL and data skipping indexes guide — LowCardinality columns are exactly the kind of columns a set or bloom_filter skip index can prune effectively. And if you want a faster way to inspect schemas and experiment with these patterns, see our guide to the best ClickHouse GUI tools.