Menu
Blog Documentation Community Pricing Demo Call Sign Up
Sign Up

An Intro to ClickHouse LowCardinality

Learn when to use LowCardinality in ClickHouse, why it speeds up queries, and where teams overuse it or apply it to the wrong columns.

ClickHouse

This post was written by an engineer at QueryPlane. QueryPlane is an app builder for your database: bring your own postgres db and you can create interactive applications to share with other developers, coworkers or even your customers. If you’re interested in trying it out, get started here.


LowCardinality is one of the most useful ClickHouse data type wrappers because it improves both storage efficiency and query speed for the right kind of column. It is also one of the easiest features to overuse once you see it in examples.

The short version is simple: LowCardinality(String) is often a great fit for repeated string values like country codes, event types, status labels, and environment names. The longer version is where the real engineering judgment lives.

This guide covers when to use it and when not to. I also verified the example below locally with clickhouse local on ClickHouse 26.4.1.1039 so we are not hand-waving the syntax.

In this post, we’ll cover:

  • What LowCardinality does
  • Why it can improve performance
  • Good column candidates
  • Bad column candidates
  • How it relates to wider schema design

What LowCardinality means

LowCardinality(T) is a wrapper type in ClickHouse that uses dictionary encoding for repeated values. Instead of storing the full value repeatedly, ClickHouse stores a dictionary of distinct values and lightweight references to those values.

That is why it is often effective for strings with a relatively limited set of repeated values.

Examples:

  • country_code
  • event_type
  • status
  • region
  • environment

This can reduce storage and also speed up operations like grouping and filtering because ClickHouse can work efficiently with encoded values.

Why it helps

There are two main wins:

  • less repeated storage
  • faster operations on repeated categorical values

ClickHouse’s own query optimization material calls out LowCardinality as an important performance tool because data types affect both size and execution speed. In practice, that means the right type choice is not cosmetic. It changes how much data ClickHouse has to move and compare.

If a string column has many repeated values, LowCardinality(String) is often one of the highest-leverage low-effort schema improvements you can make.

A simple example

CREATE TABLE events
(
  event_time DateTime,
  environment LowCardinality(String),
  event_type LowCardinality(String),
  country_code LowCardinality(String),
  user_id UInt64
)
ENGINE = MergeTree
ORDER BY (event_time, user_id);

The table definition works as written. I loaded a few rows and grouped by the encoded columns:

INSERT INTO events VALUES
  ('2026-04-01 10:00:00', 'prod', 'page_view', 'US', 101),
  ('2026-04-01 10:01:00', 'prod', 'purchase', 'US', 101),
  ('2026-04-01 10:02:00', 'staging', 'page_view', 'DE', 102),
  ('2026-04-01 10:03:00', 'prod', 'page_view', 'US', 103),
  ('2026-04-01 10:04:00', 'dev', 'signup', 'IN', 104),
  ('2026-04-01 10:05:00', 'prod', 'purchase', 'DE', 105);

SELECT
  environment,
  event_type,
  count() AS event_count
FROM events
GROUP BY environment, event_type
ORDER BY environment, event_type;

Local result:

dev      signup     1
prod     page_view  2
prod     purchase   2
staging  page_view  1

And DESCRIBE TABLE events confirms that ClickHouse keeps the columns as LowCardinality(String):

event_time    DateTime
environment   LowCardinality(String)
event_type    LowCardinality(String)
country_code  LowCardinality(String)
user_id       UInt64

This is a good use of LowCardinality because:

  • environment usually has very few values
  • event_type is often repetitive
  • country_code has a bounded set of repeated values

These are exactly the kinds of columns you group by, filter on, and display frequently in analytical workloads.

Good candidates for LowCardinality

Use it on columns that are:

  • repeated frequently
  • categorical rather than unique
  • commonly filtered or grouped
  • usually strings or similar symbolic values

Strong examples:

  • status
  • region
  • device_type
  • plan_name
  • tenant_tier
  • log_level

These are the columns that appear over and over in observability, analytics, and event pipelines.

Bad candidates for LowCardinality

Do not reach for it blindly.

Weak candidates include:

  • very high-cardinality identifiers
  • columns where most values are unique
  • long free-form text
  • columns that are naturally numeric and already compact

For example, wrapping email or session_id in LowCardinality(String) is usually not the win people hope for if those values are near-unique.

The name of the type is the hint. It is meant for low-cardinality columns.

A practical rule of thumb

Ask this question:

“Will this column repeat enough that encoding the distinct values is a meaningful win?”

If the answer is yes, LowCardinality is worth considering.

If the answer is “mostly every row is different,” it probably is not.

LowCardinality and schema readability

One underrated benefit of using LowCardinality deliberately is that it makes your schema communicate intent. When someone sees:

status LowCardinality(String)

they immediately learn something about the shape of the data. This is a categorical field with a bounded repeated domain.

That helps future schema maintenance and query design. Good schemas are not just fast. They are legible.

See what QueryPlane can build for you

Connect to your database, write SQL with AI, and build shareable apps — all from your browser.

Common mistakes

Wrapping every string column

This is the classic anti-pattern. LowCardinality is not a default for all text.

Ignoring query patterns

Some columns are low-cardinality but barely used. Others are low-cardinality and constantly grouped, filtered, and displayed. The second group benefits much more.

Forgetting that sort order still matters more

LowCardinality helps, but it does not replace good table design. ORDER BY and PRIMARY KEY choices still have a much larger impact on many query workloads. If that part is wrong, wrapping types alone will not save you.

Treating it as a substitute for normalization decisions

It is a storage and performance tool, not a modeling philosophy.

LowCardinality vs Enum

This question comes up often. In general:

  • Enum is useful when the set of values is tightly controlled and stable
  • LowCardinality(String) is more flexible when values may grow or change over time

For evolving analytics schemas, LowCardinality(String) is usually the safer and more maintainable default unless you have a strong reason to lock the value set down.

When it is especially worth it

LowCardinality often pays off quickly in:

  • observability/event tables
  • dimensional attributes with repeated labels
  • analytics marts with lots of grouping on categorical fields
  • schemas migrated from row-store systems that defaulted too many fields to plain String

Frequently asked questions

What is LowCardinality in ClickHouse? LowCardinality(T) is a column modifier that stores a dictionary of distinct values and replaces each row’s value with an integer index into that dictionary. The dictionary is held per-part and reused across the column, so when most rows share a few hundred or few thousand distinct values, the on-disk size collapses and equality/IN operations run against the dictionary instead of the raw bytes. The wrapper is transparent — SELECT and WHERE work as if the column were a plain String.

When should I use LowCardinality(String) vs plain String in ClickHouse? Use LowCardinality(String) for columns with up to roughly 100,000 distinct values, especially when those values repeat heavily across rows. Typical fits include status fields, country codes, event names, log levels, environment labels, and customer plan tiers. Plain String is the right choice when the column is effectively unique per row — IDs, UUIDs, free-form text, raw URLs — because in those cases the dictionary holds nearly as many entries as there are rows and only adds overhead.

How does LowCardinality compare to ClickHouse’s Enum type? Enum fixes the set of valid values at table-create time and stores each value as a small integer code. It is faster than LowCardinality for tight, stable sets (HTTP methods, boolean-like states) and rejects writes for unknown values, which adds a useful schema constraint. LowCardinality(String) is more flexible because new values can appear at any time without an ALTER TABLE, which usually matters more for evolving analytics schemas than the marginal speed difference.

Does LowCardinality affect sort key or primary key performance? The sort key and primary key still dominate scan cost. LowCardinality reduces per-value work and helps GROUP BY and equality filters, but it does not change how ClickHouse skips granules. If your queries always filter by tenant_id, putting tenant_id in ORDER BY matters far more than wrapping it in LowCardinality. The two are complementary — LowCardinality makes the columns the engine touches cheaper; the sort key reduces how many it has to touch in the first place.

Can I add LowCardinality to an existing column? Yes — ALTER TABLE events MODIFY COLUMN status LowCardinality(String) rewrites the column in place via background mutation. The operation reads the entire column and writes it back, so it is expensive on a large table; check system.mutations to monitor progress. For large tables the safer pattern is to create a new column, backfill with INSERT SELECT, swap names, and drop the old column.

Does LowCardinality work with Nullable? Yes. LowCardinality(Nullable(String)) is supported, but the nullable wrapper costs an extra bit per row and is rarely needed for the kinds of categorical columns LowCardinality targets. The conventional advice is to use a sentinel like the empty string instead of nullable when the column is naturally never absent — both the storage and the query plans get simpler.

Is there a cardinality threshold above which LowCardinality hurts performance? ClickHouse’s documentation suggests up to roughly 100,000 distinct values as the rule of thumb, but the practical answer is: measure compressed size and query latency. Above the threshold, the dictionary becomes large enough that the indirection overhead outweighs the storage savings. The fastest way to check is SELECT formatReadableSize(sum(data_compressed_bytes)) from system.columns with and without the wrapper on a representative dataset.

Wrapping up

LowCardinality is a great example of what makes ClickHouse powerful: seemingly small schema decisions can have outsized impact on analytical performance. Used on the right columns, it improves storage efficiency and speeds up common query patterns. Used indiscriminately, it just adds noise.

The best approach is simple:

  • apply it to repeated categorical values
  • skip it for mostly unique text
  • remember that sort key design still matters more

If you are tuning table layouts more broadly, pair this with our guide to ClickHouse PARTITION BY, ORDER BY, and PRIMARY KEY and our TTL and data skipping indexes guideLowCardinality columns are exactly the kind of columns a set or bloom_filter skip index can prune effectively. And if you want a faster way to inspect schemas and experiment with these patterns, see our guide to the best ClickHouse GUI tools.