Menu
Blog Documentation Community Pricing Demo Call Sign Up
Sign Up

ClickHouse Backup and Restore in Practice

How to back up and restore ClickHouse: native BACKUP/RESTORE to S3, clickhouse-backup, FREEZE mechanics, incremental, and DR patterns.

ClickHouse

This post was written by an engineer at QueryPlane. QueryPlane is an app builder for your database: bring your own postgres db and you can create interactive applications to share with other developers, coworkers or even your customers. If you’re interested in trying it out, get started here.


Backing up ClickHouse is not like backing up Postgres. There is no equivalent of pg_dump that streams a logical dump and “just works” — and even if there were, dumping a 50 TB columnar warehouse as SQL INSERT statements would be a non-starter. ClickHouse’s storage layout (immutable parts assembled by background merges, columnar files per granule, hard-link-friendly directory structure) gives you a much cheaper backup primitive than logical dumps, but it also means you have to make a deliberate choice about which of three backup mechanisms you use and where you put the resulting bytes.

This post walks through the three production-ready backup paths on ClickHouse — the built-in BACKUP/RESTORE SQL statements, the clickhouse-backup tool from Altinity, and the older ALTER TABLE FREEZE PARTITION + filesystem-copy pattern that still underpins both — and shows when each is the right choice. The focus is on the things you have to think about after you’ve done a happy-path backup once: incremental scheduling against S3, restoring to a new cluster, validating that a restore actually works, and the production pitfalls that turn a 6-hour outage into a 6-day outage.

In this post, we’ll cover:

  • The three backup mechanisms — native BACKUP/RESTORE, clickhouse-backup, and the underlying FREEZE PARTITION primitive
  • Native BACKUP/RESTORE — syntax, S3 destinations, incremental via base_backup, async monitoring
  • clickhouse-backup — when to pick it over native, watch mode, remote incremental, partition filters
  • FREEZE PARTITION mechanics — why hard links make backup so cheap, what the shadow/ directory really contains
  • Replicated tables and Keeper — what coordination is required, restoring to a brand-new cluster
  • Incremental backups in practice — chain length, restore math, when the savings are worth the complexity
  • Disaster recovery patterns — full vs partial restore, time-to-restore, the drill that catches everything else
  • Common pitfalls — RBAC, system.* tables, schema drift, ON CLUSTER quirks

The three backup mechanisms

Before picking a tool, it helps to know what each mechanism actually does at the filesystem level, because all three end up creating roughly the same artifact — a consistent snapshot of part directories — and the differences are mostly about who manages the lifecycle and where the bytes land.

Native BACKUP/RESTORE (docs) is built into ClickHouse since version 22.7. You issue a SQL statement like BACKUP TABLE events TO S3('https://bucket.s3.amazonaws.com/backups/2026-05-26', 'access_key', 'secret') and ClickHouse handles the freeze, copy, and metadata write itself. The output is a structured directory under the destination with metadata/, data/, and a .backup manifest file. This is the default choice for modern ClickHouse Cloud deployments and self-hosted clusters where you don’t have a specific reason to need anything else.

clickhouse-backup (GitHub, originally by Alex Akulov, now maintained by Altinity) is a standalone CLI binary that wraps FREEZE PARTITION and a remote-storage uploader. It predates the native BACKUP command by years and is still the most feature-rich option — it supports more storage backends (S3, GCS, Azure, SFTP, FTP, rclone, kopia, restic, rsync), has a watch mode for scheduled full + incremental cycles, and gives you finer control over partition filters and RBAC backup. It uses the same FREEZE primitive underneath, so the on-disk artifact is the same — what differs is the orchestration around it.

ALTER TABLE ... FREEZE PARTITION (docs) is the underlying primitive that both of the above use. It creates hard links to all active part files for a partition (or all partitions, if you omit the PARTITION clause) inside /var/lib/clickhouse/shadow/N/. Because hard links share inodes with the originals, the freeze is instantaneous and consumes no additional disk space until the original files get cleaned up by a merge. You then copy the shadow directory to durable storage with your tool of choice — rsync, aws s3 sync, rclone, anything that moves files. This is the right choice when you have very specific operational constraints (e.g., a homegrown backup pipeline integrated with company-wide infrastructure) and you’d rather assemble the moving parts yourself than depend on a backup tool’s lifecycle assumptions.

The default recommendation in 2026 is: use native BACKUP/RESTORE on ClickHouse Cloud or any self-hosted cluster running 23.x+, use clickhouse-backup when you need its specific extras (watch mode, broader storage backends, fine-grained partition selection), and reach for raw FREEZE only when both of the above don’t fit.

Native BACKUP/RESTORE

The simplest end-to-end example backs up a single table to S3:

BACKUP TABLE analytics.events TO S3(
  'https://my-backups.s3.us-east-1.amazonaws.com/clickhouse/events/2026-05-26',
  'AKIAIOSFODNN7EXAMPLE',
  'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
);

This blocks until the backup completes and returns a single row with the backup ID and status. The destination directory on S3 will contain a .backup manifest file at the root plus a data/ subtree holding the part files, all uploaded with the same checksums and structure ClickHouse uses on disk. The credentials in the third and fourth arguments are passed inline here for clarity, but in production you want to use a named collection (docs) instead, so the secret doesn’t end up in system.query_log:

CREATE NAMED COLLECTION s3_backups AS
  url = 'https://my-backups.s3.us-east-1.amazonaws.com/clickhouse',
  access_key_id = 'AKIAIOSFODNN7EXAMPLE',
  secret_access_key = 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY';

BACKUP TABLE analytics.events TO S3(s3_backups, 'events/2026-05-26');

For a multi-table or whole-database backup the BACKUP statement takes a list:

BACKUP
  DATABASE analytics,
  TABLE telemetry.spans,
  TABLE telemetry.metrics
TO S3(s3_backups, 'full/2026-05-26')
SETTINGS compression_method = 'zstd', compression_level = 3;

The SETTINGS clause is where most of the production knobs live. compression_method accepts zstd (the right default), lz4, lzma, bzip2, deflate_qpl, and a few others — zstd at level 3 is the standard analytics-warehouse-friendly point on the speed-vs-ratio curve. Encryption is exposed via password = '...' but only inside a zip-archive destination (File('/tmp/backup.zip') or S3(...).zip) rather than the raw S3 directory format; for at-rest encryption with the directory format, use SSE-KMS on the bucket itself instead.

For incremental backups, you pass base_backup to point at the previous full or incremental:

BACKUP TABLE analytics.events
TO S3(s3_backups, 'events/2026-05-27-incr')
SETTINGS base_backup = S3(s3_backups, 'events/2026-05-26');

ClickHouse will only upload parts that did not exist in base_backup — which, for a steadily-growing table where parts are immutable, is approximately just the new data since the last backup. The catch is that restoring an incremental backup requires every backup in the chain to be present at the destination, because ClickHouse stitches the parts back together by reading the manifest. If you delete the base backup, the incremental becomes useless. The standard production pattern is one weekly full + daily incrementals with the previous week’s full retained until the new week’s full completes.

Long backups — anything over a few minutes — should run async so you don’t tie up a SQL client:

BACKUP TABLE analytics.events
TO S3(s3_backups, 'events/2026-05-26')
ASYNC;

This returns an operation ID immediately. You poll system.backups to see status:

SELECT id, name, status, num_files, total_size, error
FROM system.backups
WHERE start_time > now() - INTERVAL 1 HOUR
ORDER BY start_time DESC;

Statuses cycle through CREATING_BACKUPBACKUP_CREATED (or BACKUP_FAILED with error populated). The same view is the right thing to wire into your monitoring — alert on any row where status = 'BACKUP_FAILED' in the last 24 hours, or on the absence of a BACKUP_CREATED row for a database that should be backed up nightly.

RESTORE is the mirror image and lives in the same SQL grammar:

RESTORE TABLE analytics.events FROM S3(s3_backups, 'events/2026-05-26');

You can restore into a different database or table with AS:

RESTORE TABLE analytics.events AS sandbox.events_recovered
FROM S3(s3_backups, 'events/2026-05-26');

That AS form is the right pattern for validation — instead of restoring into the production table and overwriting whatever is there, restore into a sandbox table, run a SELECT count(), min(timestamp), max(timestamp) sanity check, and only then EXCHANGE TABLES events AND events_recovered to swap them atomically.

clickhouse-backup

clickhouse-backup is installed as a single Go binary — the standard install is wget the release tarball or run the official Docker image. Configuration lives in /etc/clickhouse-backup/config.yml and looks like:

general:
  remote_storage: s3
  backups_to_keep_local: 3
  backups_to_keep_remote: 30
clickhouse:
  username: default
  password: ""
  host: localhost
  port: 9000
s3:
  bucket: my-backups
  region: us-east-1
  path: clickhouse/
  access_key: AKIAIOSFODNN7EXAMPLE
  secret_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
  compression_format: zstd

The standard workflow is three commands — create makes a local backup (a FREEZE plus metadata copy under /var/lib/clickhouse/backup/), upload ships it to S3, and delete local cleans up the on-disk copy:

clickhouse-backup create 2026-05-26
clickhouse-backup upload 2026-05-26
clickhouse-backup delete local 2026-05-26

Restoring goes the other way:

clickhouse-backup download 2026-05-26
clickhouse-backup restore 2026-05-26

The two features that make this tool worth picking over native are watch mode and remote incremental.

watch runs a long-lived loop that takes a full backup on one schedule and incremental backups on another, with automatic retention pruning:

clickhouse-backup watch \
  --watch-interval=1h \
  --full-interval=24h \
  --watch-backup-name-template={type}-{time:20060102150405}

Running this under systemd or as a Kubernetes Deployment gives you a hands-off scheduled backup loop without writing a cron wrapper. Each hourly run takes an incremental against the most recent backup; each daily run takes a fresh full. Retention is handled by backups_to_keep_remote in the config — old backups age out automatically.

Remote incremental uses the same base_backup concept as native but extends it to the upload path — the tool computes the per-part diff before uploading, so an incremental that produces 200 MB of new data uploads 200 MB, not the full table. The command is clickhouse-backup upload --diff-from-remote=<previous-backup-name>:

clickhouse-backup create incremental-2026-05-27
clickhouse-backup upload incremental-2026-05-27 \
  --diff-from-remote=full-2026-05-26

The on-disk create always materializes the full set of hard links; the savings happen at upload time. That distinction matters because /var/lib/clickhouse/backup/ will briefly hold a full freeze even for an “incremental” — you need enough free inodes (not bytes — hard links share inodes) to accommodate it.

The tool also supports partition-level granularity through --partitions:

clickhouse-backup create \
  --tables=analytics.events \
  --partitions="202604,202605" \
  recent-events

That’s the right pattern when you want to keep a long retention on a small set of recent partitions of a very large table — for example, backing up the last 7 days of a 6-month table at hourly granularity.

See what QueryPlane can build for you

Connect to your database, write SQL with AI, and build shareable apps — all from your browser.

FREEZE PARTITION mechanics

Both tools above sit on top of ALTER TABLE ... FREEZE PARTITION, and understanding what it actually does makes the rest of the backup story much easier to reason about — including why backups can be near-instantaneous on multi-TB tables and why a freeze that fails in the middle still leaves the database in a consistent state.

When you run ALTER TABLE events FREEZE PARTITION 202605 WITH NAME 'daily-backup', ClickHouse iterates through all active parts of the named partition (or every partition, if you omit the PARTITION clause), and for each part directory it creates a hard link to every file under /var/lib/clickhouse/shadow/<increment>/data/<db>/<table>/<part>/. The <increment> is a monotonically increasing integer that ClickHouse manages internally, and WITH NAME 'daily-backup' adds a daily-backup/ segment to the path so you can tell which backup the shadow directory belongs to.

Because hard links share inodes with the originals, this operation is essentially free: it does no I/O of file contents, takes milliseconds per part, and consumes negligible additional space until a merge runs. When the merge eventually rewrites a part, the old part file is deleted only from the live data/ tree — the hard link in shadow/ keeps the inode alive, and the backup is preserved. This is the property that makes ClickHouse backups so much cheaper to take than backups of mutable-row databases: you’re snapshotting an immutable layer rather than racing against writes.

What you do after the freeze depends on your tool. Native BACKUP streams the shadow contents straight to the destination and removes the shadow directory when done. clickhouse-backup copies the shadow tree into /var/lib/clickhouse/backup/<name>/ and then uploads from there. A homegrown pipeline running raw FREEZE would rsync or aws s3 sync the shadow/<N>/ directory to durable storage and then call ALTER TABLE ... UNFREEZE WITH NAME 'daily-backup' to clean up:

ALTER TABLE analytics.events FREEZE WITH NAME 'daily-backup';
-- copy /var/lib/clickhouse/shadow/<N>/ to durable storage
ALTER TABLE analytics.events UNFREEZE WITH NAME 'daily-backup';

UNFREEZE removes the hard links and lets a future merge actually reclaim the space. Forgetting to UNFREEZE is the most common operational mistake with the raw-freeze pattern — months later, you discover that /var/lib/clickhouse/shadow/ has grown to the same size as your active data because merges couldn’t free anything. The two managed tools handle the unfreeze automatically; you only need to think about it if you’re driving the freeze yourself.

Replicated tables and Keeper

Backing up ReplicatedMergeTree tables introduces one extra concern: replicas have their own copies of the parts, and the metadata that ties them together lives in ClickHouse Keeper (or ZooKeeper, on older deployments). For most backup purposes this is invisible — a BACKUP TABLE replicated_events issued on any replica produces a complete snapshot of the data, because every replica has a full copy. The Keeper paths are not included in the backup, but on restore, ClickHouse will create new Keeper paths under whatever ZooKeeper path the destination cluster uses.

The case where this gets interesting is restoring to a brand-new cluster that has its own Keeper. Native RESTORE handles this through the allow_non_empty_tables and structure_only settings, plus a clean Keeper path:

RESTORE TABLE analytics.events
FROM S3(s3_backups, 'events/2026-05-26')
SETTINGS allow_non_empty_tables = 0;

When you restore a replicated table to a new cluster, the ReplicatedMergeTree engine arguments in the original DDL (the /clickhouse/tables/{shard}/... path and replica name) get applied as written. If the source cluster used /clickhouse/tables/{shard}/events and the destination cluster has nothing under that path, the restore creates fresh Keeper metadata and the new cluster’s replicas re-sync from each other after the restore. If the destination cluster already has different data at that Keeper path, the restore will fail with REPLICA_ALREADY_EXISTS — the safe pattern is to either drop the existing tables first or restore with AS into a new database name and migrate manually.

For clickhouse-backup, the same restore path is --rm plus --schema-only/--data-only for staged restores:

clickhouse-backup restore --rm --schema-only 2026-05-26
# verify schemas
clickhouse-backup restore --data-only 2026-05-26

This two-step pattern is worth knowing because it lets you catch schema-incompatibility issues (a column type changed between backup and restore) before you commit to copying the data.

Incremental backups in practice

Incremental backups sound like a pure win — only ship the deltas, save bandwidth and storage — but they introduce a coupling between backups that complicates retention and lengthens restore time. The right way to think about them is:

  • Storage cost goes down, often dramatically for tables that append-only and never OPTIMIZE — an hourly incremental of a write-heavy events table can be tens of MBs instead of TBs.
  • Restore time goes up roughly linearly with chain length, because each incremental’s parts have to be applied in order. A weekly full + 168 hourly incrementals restores noticeably slower than a daily full.
  • Retention gets harder. You can’t delete a base backup until you no longer need anything that depends on it. Deleting the wrong backup silently breaks the dependents — there’s no transactional check.

The pattern that works in practice is weekly full + daily incrementals, with two weeks of fulls kept and the chain reset every week. The mental model is: the worst-case restore is one full + six incrementals, and you tolerate that for the savings on storage and on the time you spend re-uploading 50 TB every night. For very large tables (multi-PB), some teams go to weekly full + hourly incrementals, but at that scale the restore-chain length starts to dominate restore time and you want to test the restore quarterly.

The math to validate the tradeoff is straightforward:

incremental_storage_per_day = avg_daily_new_data_size
full_storage_cost = table_size * num_fulls_retained
chain_restore_time = full_restore_time + sum(incremental_restore_times)

If chain_restore_time exceeds your RTO (recovery time objective), you need either shorter chains or a different restore strategy (e.g., parallel restore by partition, or a continuously replicated standby cluster).

Disaster recovery patterns

A backup that has never been restored is not a backup. The most common failure mode for ClickHouse backup pipelines is not the backup itself — clickhouse-backup and native BACKUP are both stable and well-tested — but the discovery, mid-incident, that the restore does not work because of some assumption that was never tested.

The minimum DR drill is:

  1. Quarterly full-restore drill to a separate cluster (or even a separate ClickHouse process on a separate host). Restore the most recent full backup, run a known query against a known partition, compare the result to production. If you can, automate this as a CI job.
  2. Monthly partial-restore drill of a single recent partition via RESTORE TABLE events AS sandbox.events_check FROM S3(...) followed by SELECT count(*), min(ts), max(ts) FROM sandbox.events_check. This catches credential rotation, S3 bucket policy changes, and schema drift much faster than the quarterly drill.
  3. Backup-failure alerting, not just success. Most teams alert on the backup job succeeding (cron exit code, GitHub Action green). The signal that actually matters is whether a BACKUP_CREATED row appears in system.backups (for native) or whether the latest backup name in clickhouse-backup list remote is from the last 24 hours.

For point-in-time recovery — the “I want to roll back to 11:53 last Tuesday” case — neither native BACKUP nor clickhouse-backup give you a continuous-log mechanism out of the box. The closest pattern is hourly incrementals plus a Kafka or Pulsar source-of-truth that you can re-consume from a known offset. ClickHouse’s Kafka table engine makes this pattern natural: keep the Kafka topic’s retention longer than your incremental cadence (say, 7 days for hourly), and to recover to a precise point you restore the latest incremental before the desired time and then replay Kafka forward.

For ClickHouse Cloud, continuous backups and point-in-time restore are managed by the platform — you don’t operate the backup pipeline yourself. The patterns above apply mostly to self-hosted clusters.

Common pitfalls

A handful of issues catch teams the first time they take backups seriously.

RBAC is not backed up by BACKUP TABLE or BACKUP DATABASE. Roles, users, row policies, and quotas live in system.users, system.roles, etc., and need their own backup path. Native BACKUP supports BACKUP ALL which includes access-control metadata, but if you’re backing up specific databases you need to also back up access entities separately with BACKUP TABLE system.users, TABLE system.roles, TABLE system.grants TO ... (for SQL-managed access control only) or carry the access DDL in a Terraform/migration repository. clickhouse-backup has --rbac and --configs flags for the same purpose.

system.* tables are intentionally excluded. system.query_log, system.part_log, etc. are operational data and not part of any backup mechanism. If you need historical query logs for compliance, you have to forward them to a separate store (S3, OpenSearch, ClickHouse Cloud, etc.) outside the backup loop.

Schema drift between backup and restore breaks the restore. If you alter a column type or add a column between the backup and the restore destination, native RESTORE will fail with a type-mismatch error. The fix is to apply the schema migration on the destination first, then RESTORE with structure_only = 0, allow_non_empty_tables = 1 so the structure isn’t re-applied but the data is loaded into the already-existing table. The two-step --schema-only / --data-only workflow in clickhouse-backup handles the same case.

ON CLUSTER does not parallelize backups. BACKUP TABLE ... ON CLUSTER cluster_name TO ... runs the backup on every node in the cluster, but each node tries to write to the same destination path. For replicated tables that’s redundant work (every replica has the same data) and for sharded tables it doesn’t produce a coherent backup. The right pattern for a sharded cluster is one backup per shard, each writing to a distinct path, coordinated by an external scheduler.

S3 lifecycle rules can silently delete backup parts. A common cost-optimization is to add an S3 lifecycle rule that transitions objects older than 30 days to Glacier or deletes them. Applied to the backup bucket without thought, this can destroy your retention chain — incrementals that depend on a base in Glacier won’t restore without first restoring the Glacier object back to Standard. Either keep the backup bucket on a lifecycle policy you understand precisely, or store backups in a bucket with no lifecycle at all and prune via clickhouse-backup delete remote (which respects retention configuration).

Multi-disk storage layouts complicate FREEZE. If your table uses storage policies with multiple disks (typical for hot/cold tiering), FREEZE produces a separate shadow/N/disks/<disk_name>/... subtree per disk. The two managed tools handle this correctly, but a homegrown rsync script needs to walk the per-disk subtrees rather than assuming a single root.

Frequently asked questions

Is clickhouse-backup still recommended now that ClickHouse has native BACKUP/RESTORE? Yes, for specific cases. The native BACKUP command is the right default for ClickHouse Cloud and for self-hosted clusters that don’t need anything beyond “back up to S3 nightly”. clickhouse-backup is still the better choice when you need its specific extras: watch mode for hands-off scheduling, support for GCS / Azure / SFTP / rclone destinations that native doesn’t expose as first-class targets, partition-level filters in --partitions, and the --rbac / --configs flags for backing up access control and server configuration. If none of those apply, native is simpler.

Where do backup files live before they get uploaded? Native BACKUP streams directly to the destination without an intermediate local copy. clickhouse-backup materializes the local backup under /var/lib/clickhouse/backup/<backup-name>/ first and uploads from there, so you need enough inodes (not bytes — hard links share inodes) for a full freeze. Raw FREEZE PARTITION writes hard links into /var/lib/clickhouse/shadow/<increment>/ and leaves them there until you call UNFREEZE.

Does taking a backup block writes or merges? No. The freeze step takes a snapshot via hard links, which is non-blocking at the filesystem level. Merges continue against the live parts while the backup uploads. The only resource the backup contends for is upload bandwidth, which is why scheduling backups during low-traffic windows still matters even though the backup itself doesn’t lock the table.

How do I back up just one partition? With native BACKUP, use BACKUP TABLE events PARTITION 202605 TO S3(...) (the partition value must match your PARTITION BY expression — for toYYYYMM(ts) that is the integer YYYYMM, e.g. 202605). With clickhouse-backup, pass --partitions="202605" to create. Partition-scoped backups are the right pattern when you have a very large table with high-value recent data and lower-value historical data — back up recent partitions hourly and historical partitions monthly.

Can I restore a single table from a database-level backup? Yes. RESTORE TABLE events FROM S3(...) works against a backup that was created with BACKUP DATABASE analytics — you don’t have to restore the whole database to pull a single table. clickhouse-backup exposes the same via --tables=analytics.events on restore. This is the right pattern for “we accidentally dropped one table, give us back just that one”.

What happens to a MaterializedView on restore? Materialized views are restored as schema only — they don’t carry their own data because the data lives in the underlying target table. After a restore, the materialized view will resume processing new inserts but won’t reprocess historical inserts. If you need the historical materialized-view results, back up the target table (the TO inner_table_name or .inner.mv_name) rather than relying on the view definition itself. See our ClickHouse materialized views in production post for more on the target-table pattern.

How do I verify a backup actually works without restoring over production? Restore with AS into a different database or table: RESTORE TABLE events AS sandbox.events_check FROM S3(...). Run a known-result query against the sandbox table — SELECT count(), min(timestamp), max(timestamp), uniqExact(user_id) is a useful spot-check that catches both missing rows and corrupted columns. For clickhouse-backup, restore into a separate ClickHouse instance via --rm and the clickhouse.host setting in a separate config file.

How long should I retain backups? The right answer depends on your compliance and DR requirements, not on a generic best practice. Two patterns are common: operational retention of 30 days of daily backups for “we deleted something we needed” recovery, and regulatory retention of 7 years of monthly backups in cheap deep-archive storage for “auditor wants Q3 2024” requests. Storage cost differences between these are large enough that you usually want them in different buckets with different lifecycle policies.

Can I run BACKUP while a heavy OPTIMIZE is running? Yes. OPTIMIZE produces new parts and eventually deletes the old ones, but the freeze inside BACKUP captures a snapshot of the parts that exist at freeze time — including the pre-OPTIMIZE parts if the freeze beats the merge. You may end up backing up parts that are about to be cleaned up, which is harmless but slightly wastes upload bandwidth. If your nightly OPTIMIZE workload is very heavy, scheduling the backup after the optimize finishes will produce a slightly smaller backup.

Does the backup include INDEX definitions like skip indexes and projections? Yes — both native BACKUP and clickhouse-backup capture full table DDL including secondary indexes, projections, and primary key / partition definitions. The actual index data (for skip indexes) and projection parts are stored alongside the main table data in the backup, so a restore reconstructs both the schema and the precomputed structures.

Wrapping up

The honest summary of ClickHouse backups in 2026 is: the mechanics are easy, the operational discipline around them is what catches teams out. Pick the tool that fits your workflow (native BACKUP for most cases, clickhouse-backup when you need its extras), put it on a schedule with retention you understand, and run restore drills often enough that you trust the path. The underlying FREEZE PARTITION primitive makes the act of taking a backup nearly free; the work is in proving that you can restore it.

If you’re running ClickHouse at the scale where this matters, the rest of our ClickHouse series covers the other production levers: materialized views for query-time precomputation, projections for alternate sort orders without duplicating data, joins for the patterns that actually scale, LowCardinality for the encoding that makes string columns cheap, ReplacingMergeTree for upsert semantics, and the partition vs order vs primary key guide for the schema decisions backups depend on. And when you need a SQL editor that connects to ClickHouse without a desktop install — useful for the “run a sanity query against a sandbox restore” workflow — the QueryPlane ClickHouse integration takes a connection string and gives you a browser tab with autocomplete, query history, and shareable results.