Menu
Blog Documentation Community Pricing Demo Call Sign Up
Sign Up

Snowflake Iceberg Tables in Practice

Running Snowflake Iceberg tables in production: Snowflake-managed vs externally-managed tables, catalog options, the cost model, and footguns that show up under load.

Snowflake

This post was written by an engineer at QueryPlane. QueryPlane is an app builder for your database: bring your own postgres db and you can create interactive applications to share with other developers, coworkers or even your customers. If you’re interested in trying it out, get started here.


Apache Iceberg has quietly become the format every serious data platform agrees on. Snowflake, Databricks, BigQuery, Trino, Spark, DuckDB, and Dremio all read and write the same on-disk format, so a single dataset in S3 can be queried by every engine your team uses. For Snowflake teams in particular, Iceberg tables are the answer to two long-standing problems: paying twice to keep a copy of data inside Snowflake when it already lives in a data lake, and being locked into a single engine for analytics on warehouse data.

This post is about what changes when you actually run Iceberg tables in production on Snowflake — not the marketing pitch. There are two flavors of Iceberg table in Snowflake (Snowflake-managed and externally-managed), three realistic catalog choices, a cost model that looks different from native FDN tables, and a handful of footguns that only show up under load. We’ll cover all of them, with the SQL you’d actually run.

In this post, we’ll cover:

  • Two flavors of Iceberg table — Snowflake-managed vs externally-managed, when each fits
  • Catalogs — Snowflake’s built-in catalog, Polaris, AWS Glue, and the REST catalog interop story
  • External volumes — the storage layer Iceberg tables sit on, IAM, and the multi-cloud surface
  • Creating tables — full DDL for both flavors, file formats, partitioning, sort orders
  • Cross-engine interop — reading the same table from Spark / Trino / BigQuery
  • Performance — what costs less, what costs the same, and where the gap shows up
  • Maintenance — compaction, snapshot expiration, orphan file cleanup
  • Migration — from native FDN, from external tables, from S3 Parquet
  • Production pitfalls — six things that bit teams in the first year of GA

The two flavors of Iceberg table

Snowflake exposes Iceberg through one DDL statement (CREATE ICEBERG TABLE) but with two very different operational models behind it, and confusing them is the most common source of “why doesn’t this work” questions.

A Snowflake-managed Iceberg table is one Snowflake owns end to end: it writes the data files, manages the metadata, runs the catalog, and is the only engine that can write to the table. Other engines can read it through a catalog SDK or via the Iceberg REST catalog, but writes always go through Snowflake. The storage sits in your own cloud bucket (an “external volume”), not in Snowflake’s internal stage, which means you can point Spark or Trino at the same files for reads while keeping Snowflake’s full feature set on writes — time travel, zero-copy clone, masking policies, row access policies, the lot.

An externally-managed Iceberg table (docs) is the opposite: another system writes the data files and owns the catalog, and Snowflake is a read-mostly consumer. The “another system” is typically AWS Glue, an Apache Polaris catalog, or a REST catalog running somewhere. Snowflake reads the table by calling the external catalog to find the current metadata pointer, then reads the Parquet files directly from object storage. Some externally-managed tables also support DML from Snowflake (writes), but the rules depend on the catalog — Glue does, REST catalogs vary by implementation, and Polaris is moving toward full read/write.

The decision rule we use: if Snowflake is the system of record for the data, make it Snowflake-managed. If another team or pipeline (Spark jobs, dbt-on-Trino, Flink, a CDC consumer) writes the table and Snowflake just needs to query it, make it externally-managed. The common mistake is choosing externally-managed because “we want to keep the data outside Snowflake” — Snowflake-managed tables already keep the data outside Snowflake (in your own bucket), with a much cleaner ops story.

Catalogs

The Iceberg catalog is the piece that tells you which metadata file is the current one for a table. Snowflake supports three meaningful catalog choices.

Snowflake’s own catalog is what you get with Snowflake-managed tables. There’s nothing to configure: when you CREATE ICEBERG TABLE ... CATALOG = 'SNOWFLAKE', Snowflake maintains the catalog pointers inside its own metadata service. External engines can read this catalog through the Iceberg REST catalog endpoint that Snowflake exposes per account, which speaks the standard Iceberg REST protocol and is what Spark / Trino / PyIceberg connect to.

AWS Glue is the most common external catalog choice for teams already on AWS. Snowflake reads the Glue Data Catalog via an AWS Glue catalog integration, which requires an IAM role Snowflake can assume to call Glue’s API. Glue is a fine choice when you have existing Spark or EMR jobs writing Iceberg tables and registering them in Glue.

Apache Polaris is the newer, open option — originally donated by Snowflake to the Apache Software Foundation in 2024 and now graduating as a top-level project. Polaris is engine-agnostic by design, supports both Snowflake-managed and externally-managed tables, and runs on whatever infrastructure you choose. Snowflake released Snowflake Open Catalog as a hosted Polaris offering, which is the easiest path if you want Polaris semantics without operating it yourself. If you’re starting from scratch on multi-engine workflows in 2026, Polaris is the bet that doesn’t lock you into Glue or any one vendor’s catalog.

A useful detail: an externally-managed Iceberg table on Snowflake doesn’t store the catalog metadata twice. Snowflake calls the external catalog every time someone reads the table to discover the current metadata pointer, then caches the result for a short window. That’s why a write from Spark is visible to Snowflake within seconds of the catalog commit — there’s no separate “sync” step to schedule.

External volumes

Every Iceberg table on Snowflake (managed or external) sits on storage you own, and Snowflake reaches that storage through an external volume. The external volume is a Snowflake object that bundles an S3/Azure/GCS location with the IAM trust relationship Snowflake needs to read and write there.

Creating one looks like this:

CREATE EXTERNAL VOLUME iceberg_vol
  STORAGE_LOCATIONS = (
    (
      NAME            = 'us-east-1-iceberg',
      STORAGE_PROVIDER = 'S3',
      STORAGE_BASE_URL = 's3://acme-iceberg-prod/',
      STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/SnowflakeIcebergRole',
      STORAGE_AWS_EXTERNAL_ID = 'iceberg_external_id'
    )
  )
  ALLOW_WRITES = TRUE;

The ALLOW_WRITES flag is what gates Snowflake-managed write capability. Set it to TRUE for buckets that should accept new data files, and FALSE for read-only mirrors. The IAM trust relationship on the AWS side has to allow Snowflake’s account-specific external ID — Snowflake exposes both the IAM user it’ll use and the external ID via DESC EXTERNAL VOLUME iceberg_vol, and you copy those into the role’s trust policy.

A subtle point: an external volume can hold storage locations across multiple regions, and Snowflake will pick the one closest to the warehouse that’s running the query. For multi-region setups this matters a lot — if your warehouses are in us-east-1 but the volume’s only location is eu-west-1, you’ll pay full cross-region S3 transfer on every read. Multi-region volumes are also the cleanest way to handle disaster recovery: replicate the underlying S3 buckets with S3 cross-region replication and add the replica location to the volume.

Creating a Snowflake-managed Iceberg table

The DDL for a Snowflake-managed table is almost the same shape as a native FDN table, with two extra clauses:

CREATE ICEBERG TABLE orders (
  order_id     NUMBER,
  customer_id  NUMBER,
  region       STRING,
  status       STRING,
  total_cents  NUMBER,
  ordered_at   TIMESTAMP_NTZ
)
  EXTERNAL_VOLUME = 'iceberg_vol'
  CATALOG         = 'SNOWFLAKE'
  BASE_LOCATION   = 'orders/';

The BASE_LOCATION is the relative path under the external volume’s base URL where Snowflake writes the data and metadata files. We use one path per table — orders/, customers/, events/raw/ — so the on-disk layout in S3 stays human-navigable. Snowflake creates two subdirectories under that path: data/ for Parquet files and metadata/ for v*.metadata.json and manifest files.

Once created, the table behaves like a normal Snowflake table for almost every operation. You INSERT, MERGE, UPDATE, DELETE against it; you can attach masking policies and row access policies; clones, time travel, and zero-copy clone all work. The differences are operational: data files end up in your bucket (you can aws s3 ls to see them), and reads from external engines are possible (more on this in the interop section).

Partitioning and sort orders

Iceberg has its own partitioning concept — hidden partitioning — that’s different from Snowflake’s micro-partitions. With hidden partitioning, you declare a partition column derived from a base column (e.g. month(ordered_at)), and Iceberg’s planning layer prunes based on that derivation without the query needing to mention the partition column. Snowflake supports declaring Iceberg partition specs at table creation:

CREATE ICEBERG TABLE events (
  event_id     STRING,
  user_id      NUMBER,
  event_type   STRING,
  payload      VARIANT,
  occurred_at  TIMESTAMP_NTZ
)
  EXTERNAL_VOLUME = 'iceberg_vol'
  CATALOG         = 'SNOWFLAKE'
  BASE_LOCATION   = 'events/'
  PARTITION BY (DAY(occurred_at), event_type);

Pick a partition spec the same way you’d pick clustering keys on a native table: by the column the query layer most often filters on, with enough cardinality to prune but not so much that you end up with millions of tiny files. The Iceberg hidden-partition story is more flexible than Snowflake’s clustering keys for one reason: you can ALTER TABLE ... ADD PARTITION FIELD ... to change the partition spec without rewriting old data. Old partitions keep their old layout; new writes use the new spec. That’s a meaningful operational win when access patterns shift.

Sort orders are the other knob. CREATE ICEBERG TABLE ... ORDER BY (customer_id) tells Snowflake to write each new data file’s rows in customer_id order, which makes range scans and joins on customer_id significantly faster. Sort order doesn’t replace partitioning; it stacks on top of it. A well-tuned events table is often partitioned by DAY(occurred_at) and sort-ordered by user_id, giving you both temporal pruning and user-level locality within each partition.

Creating an externally-managed Iceberg table

For a table whose data is written by Spark, Trino, or some other system and registered in an external catalog, the DDL looks different:

-- One-time setup: catalog integration
CREATE CATALOG INTEGRATION glue_prod
  CATALOG_SOURCE         = GLUE
  CATALOG_NAMESPACE      = 'analytics'
  TABLE_FORMAT           = ICEBERG
  GLUE_AWS_ROLE_ARN      = 'arn:aws:iam::123456789012:role/SnowflakeGlueRole'
  GLUE_CATALOG_ID        = '123456789012'
  GLUE_REGION            = 'us-east-1'
  ENABLED                = TRUE;

-- Register an existing Iceberg table
CREATE ICEBERG TABLE orders_external
  CATALOG               = 'glue_prod'
  EXTERNAL_VOLUME       = 'iceberg_vol'
  CATALOG_TABLE_NAME    = 'orders'
  AUTO_REFRESH          = TRUE;

A few details worth understanding. CATALOG_NAMESPACE is the Glue database the table lives in. CATALOG_TABLE_NAME is the table’s name inside that database. AUTO_REFRESH = TRUE tells Snowflake to subscribe to catalog notifications (when supported) and re-read the metadata pointer on every query, so writes from the external system are visible within seconds. Without AUTO_REFRESH, you have to call ALTER ICEBERG TABLE orders_external REFRESH after external writes — fine for batch pipelines, painful for streaming.

The external volume in this case is still used: even though Snowflake didn’t write the data files, it needs read access to the same S3 location the Parquet files live in, and the external volume is how Snowflake authorizes that access. The volume’s STORAGE_BASE_URL must be a prefix of the path the catalog records for the table; otherwise you get Access Denied errors that look like a misconfigured IAM role but are actually a volume scope problem.

Cross-engine interop

The headline reason to use Iceberg is that the same table is queryable from every engine. Concretely, for a Snowflake-managed table called analytics.orders, you can configure PyIceberg (or Spark, or Trino) to read it like this:

from pyiceberg.catalog import load_catalog

catalog = load_catalog(
  "snowflake",
  **{
    "type":            "rest",
    "uri":             "https://your-account.snowflakecomputing.com/api/v2/iceberg/",
    "warehouse":       "analytics",  # Snowflake DB.SCHEMA
    "token":           "<oauth-token>",
    "header.X-Iceberg-Access-Delegation": "vended-credentials",
  },
)

table = catalog.load_table("analytics.orders")
df = table.scan().to_arrow()

The vended-credentials mode is the part that makes this practical — Snowflake hands the client a short-lived AWS credential scoped to the table’s S3 path, instead of requiring the client to have its own IAM setup. The credential is short-lived (15 minutes by default), refreshed on demand, and scoped narrowly enough that a compromised client can only read the one table.

For an externally-managed table, the interop story is more direct: you point your other engines at the same catalog (Glue, Polaris, or REST) and the same external volume, and reads work without any Snowflake involvement at all. Snowflake just happens to be one of the consumers.

A practical pattern we see frequently: a Spark job writes a bronze Iceberg layer to S3 and registers it in Polaris. Snowflake reads bronze as an externally-managed table, runs cleanup and aggregation, and writes the silver and gold layers as Snowflake-managed Iceberg tables. dbt-on-Snowflake runs the gold-layer transformations. Trino, used by a separate ML team, reads the same gold layer back out via the Snowflake REST catalog. Three engines, one set of files, no copy-out step.

See what QueryPlane can build for you

Connect to your database, write SQL with AI, and build shareable apps — all from your browser.

Performance

The honest answer on Iceberg-vs-FDN performance is: similar for most workloads, with a few specific places where the gap shows up.

Reads against a well-partitioned Iceberg table typically run within 10-20% of the equivalent FDN table on the same warehouse. The reasons FDN tends to be faster are micro-partitions and the proprietary FDN format’s tighter integration with Snowflake’s vectorized execution: FDN files include column-min/max statistics in a layout the planner can use without any additional file open, while Iceberg’s per-file statistics live in manifest files that have to be read first. For queries that scan a small fraction of a large table — a single-day filter on a year-long table — both formats benefit roughly equally from partition pruning.

Where Iceberg has caught up in the last year: Snowflake’s vectorized scanner now handles Parquet with V2 row groups about as fast as FDN, and the planner uses Iceberg manifest statistics for partition pruning without falling back to file-level metadata reads.

Where Iceberg still costs more: writes. Every Snowflake write to an Iceberg table commits an Iceberg snapshot (a new metadata.json + a new manifest). On wide tables with frequent small writes, the per-commit overhead matters — a job that writes 1,000 small INSERT batches per minute will accumulate 1,000 snapshots per minute, growing the manifest list and slowing planning over time. The fix is the same as in any Iceberg deployment: write in larger batches, schedule compaction (more on this in the maintenance section), and consider expiring old snapshots aggressively if you don’t need long retention.

The other place to watch costs is cold reads of externally-managed tables. The first query against an externally-managed table after a metadata refresh pays for one round trip to the external catalog plus reading the new manifest list from S3. On a multi-petabyte table with thousands of partitions, that’s seconds of latency before any data is read. Hot reads — within a few minutes of the last query — hit Snowflake’s catalog cache and metadata cache, and the overhead disappears.

Maintenance

An Iceberg table is a mutable thing that accumulates state, and forgetting to maintain it is the #1 reason teams swear off the format six months in. There are three jobs that need to run on a schedule.

Compaction rewrites many small Parquet files into fewer larger ones. Snowflake-managed tables run compaction automatically by default; you can control it per-table with the ENABLE_DATA_COMPACTION parameter:

ALTER ICEBERG TABLE orders
  SET ENABLE_DATA_COMPACTION = TRUE;

For externally-managed tables, compaction is the responsibility of whatever system writes the table — typically a periodic Spark job calling Iceberg’s rewrite_data_files procedure. Without it, table planning gets slower over time as the manifest list grows.

Snapshot expiration deletes old snapshots so they stop counting against storage and metadata size. Snowflake-managed tables expire snapshots automatically based on the table’s DATA_RETENTION_TIME_IN_DAYS setting (the same parameter that controls time travel). For external tables, run expire_snapshots from Spark on whatever cadence matches your time-travel needs. A common starting point is 7 days for analytics tables, 30 days for tables that back BI dashboards.

Orphan file cleanup is the third job — there’s no automatic equivalent for it, and skipping it is what bloats your S3 bill over months. Iceberg writes new files for every commit and marks old files for deletion in a manifest, but the actual file deletion only happens during snapshot expiration. If a write job crashes mid-commit, you can end up with data files in S3 that no live snapshot references. Run Iceberg’s remove_orphan_files procedure once a week from Spark to clean those up. Snowflake-managed tables handle this automatically as part of their storage maintenance, but it’s worth confirming on the TABLE_STORAGE_METRICS view that table storage is staying in line with logical row counts.

Migration

Three common starting points map to three migration playbooks.

Native FDN → Snowflake-managed Iceberg. The simplest migration. CREATE ICEBERG TABLE orders_iceberg LIKE orders to create the new shape, then INSERT INTO orders_iceberg SELECT * FROM orders to copy the data. Swap names atomically with ALTER TABLE orders RENAME TO orders_fdn_archive; ALTER ICEBERG TABLE orders_iceberg RENAME TO orders;. Drop the archive after a holding period. Test compaction and external read access before swapping in production.

Snowflake external tables → Iceberg. Snowflake’s older external tables feature points at Parquet/CSV/JSON in S3 without an Iceberg catalog. Migrating to externally-managed Iceberg is a meaningful upgrade because you get schema evolution, time travel, and atomic writes. Use Spark or Snowflake’s own CONVERT TO ICEBERG procedure (where available) to register existing Parquet files into an Iceberg metadata layer. The data files stay in place; only the manifest is new.

Raw S3 Parquet → Iceberg. The “we’ve been writing Parquet ourselves” case. The simplest path is to write the existing data into a new Iceberg table from Spark with df.writeTo("catalog.namespace.table").create(), which produces a clean Iceberg metadata layer over the data. Pointing Snowflake at the result is a CREATE ICEBERG TABLE ... CATALOG = '...' CATALOG_TABLE_NAME = '...' away.

In all three cases, the trap to avoid is doing the migration in-place by hand-editing manifests or table metadata. Iceberg’s atomic-commit guarantees come from the catalog being the only place that decides which metadata.json is current. Hand-edits skip that path, leave the table in a state where different engines see different “current” snapshots, and are extremely hard to recover from. Always go through a writer (Snowflake DML, Spark API, PyIceberg) that talks to the catalog.

Production pitfalls

Six things that bit teams in the first year of Iceberg-table GA on Snowflake, in rough order of frequency.

Forgetting compaction. Already mentioned but worth repeating: an Iceberg table without compaction will perform worse and worse over months. Confirm ENABLE_DATA_COMPACTION is not disabled on Snowflake-managed tables; schedule rewrite_data_files for externally-managed tables. Monitor ICEBERG_STORAGE_OPTIMIZATION_HISTORY and TABLE_STORAGE_METRICS to track whether compaction is running and whether average file size is staying healthy — if it’s drifting below 64 MB on an active table, compaction is behind.

External volume scope mismatches. An external volume’s STORAGE_BASE_URL must be a prefix of the actual data path. If the volume is s3://acme-iceberg-prod/ and a Glue-registered table lives at s3://acme-iceberg-staging/orders/, queries fail with Access Denied. Easy to fix once you know the symptom, hard to find from the error message alone.

Vended credentials timing out mid-query in PyIceberg. Vended credentials are 15-minute tokens by default. A long-running PyIceberg scan that exceeds 15 minutes will fail with an opaque AWS auth error. Either increase the warehouse size on the source side so scans finish faster, or break the scan into smaller window queries that each complete inside the credential lifetime.

Time-travel windows getting silently extended. Setting DATA_RETENTION_TIME_IN_DAYS = 30 on a Snowflake-managed Iceberg table keeps 30 days of snapshots — and 30 days of data files referenced by those snapshots. Storage cost scales linearly with retention on Iceberg tables in a way that’s more visible than on FDN tables (because the files are in your bucket and on your bill). Start conservative; lengthen retention when you have a concrete reason.

AUTO_REFRESH thundering herd. For externally-managed tables with AUTO_REFRESH = TRUE, every query refreshes the catalog pointer. Under high QPS on a small warehouse, the catalog refresh latency dominates query time. The fix is to use a larger warehouse with concurrent query handling, or to drop AUTO_REFRESH and call ALTER ICEBERG TABLE ... REFRESH on an explicit schedule.

Schema evolution divergence. Iceberg supports adding columns, dropping columns, renaming columns, and reordering — all without rewriting data files. Snowflake’s DDL maps to those Iceberg operations, but the underlying field IDs that Iceberg uses to track columns through renames are managed by the catalog. If two engines (Snowflake and Spark) both try to evolve the same table’s schema simultaneously, you can end up with field ID conflicts that only one engine notices. Pick one engine as the source of schema changes per table and route all DDL through it.

A simple end-to-end example

Here’s a complete example that ties the pieces together — create an external volume, create a Snowflake-managed Iceberg table, load it, expose it to PyIceberg:

-- 1. External volume (one-time per bucket/region)
CREATE EXTERNAL VOLUME iceberg_vol
  STORAGE_LOCATIONS = (
    (
      NAME             = 'us-east-1-iceberg',
      STORAGE_PROVIDER = 'S3',
      STORAGE_BASE_URL = 's3://acme-iceberg-prod/',
      STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/SnowflakeIcebergRole',
      STORAGE_AWS_EXTERNAL_ID = 'iceberg_external_id'
    )
  )
  ALLOW_WRITES = TRUE;

-- 2. Snowflake-managed Iceberg table
CREATE ICEBERG TABLE analytics.events (
  event_id     STRING,
  user_id      NUMBER,
  event_type   STRING,
  payload      VARIANT,
  occurred_at  TIMESTAMP_NTZ
)
  EXTERNAL_VOLUME = 'iceberg_vol'
  CATALOG         = 'SNOWFLAKE'
  BASE_LOCATION   = 'events/'
  PARTITION BY (DAY(occurred_at), event_type);

-- 3. Load data
COPY INTO analytics.events
FROM @raw_stage/events/
FILE_FORMAT = (TYPE = PARQUET)
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;

-- 4. Enable compaction (on by default; set explicitly to be safe)
ALTER ICEBERG TABLE analytics.events
  SET ENABLE_DATA_COMPACTION = TRUE;

-- 5. Generate a Snowflake OAuth token for PyIceberg (in a separate session)
SELECT SYSTEM$GENERATE_SCOPED_OAUTH_TOKEN(...);

After that the table is queryable from Snowflake, addressable as analytics.events, and readable from PyIceberg using the OAuth token and the REST catalog URL. The same query pattern works in Spark with the Snowflake Iceberg catalog configured.

Frequently asked questions

Are Snowflake Iceberg tables slower than native FDN tables? For most read workloads, the difference is 10-20% on the same warehouse. Reads against a well-partitioned Iceberg table use Snowflake’s vectorized scanner and benefit from Iceberg’s manifest statistics for partition pruning. FDN tables still have a small edge from tighter integration with the proprietary file format. Where the gap is more noticeable is writes: every Iceberg write commits a new metadata snapshot, so many small writes compound overhead in a way they don’t on FDN. Batch writes and enable automatic compaction to close the gap.

Can other engines write to a Snowflake-managed Iceberg table? No — Snowflake-managed means Snowflake is the only writer. Other engines (Spark, Trino, PyIceberg, DuckDB) can read through the Iceberg REST catalog Snowflake exposes, with vended credentials handling the S3 access. If you need multi-writer access, use an externally-managed table backed by a catalog like Glue or Polaris that supports concurrent writes from multiple engines.

How does Snowflake time travel work on Iceberg tables? The same way as on FDN tables. SELECT ... AT(TIMESTAMP => ...) and SELECT ... AT(OFFSET => ...) both work, and the retention window is controlled by the table’s DATA_RETENTION_TIME_IN_DAYS setting. Behind the scenes, Snowflake uses Iceberg’s native snapshot history — every commit creates a snapshot, and time travel resolves to the snapshot closest to the requested time. Snapshots beyond the retention window are eligible for expiration, after which the underlying data files are deleted.

What’s the difference between Snowflake Open Catalog and Apache Polaris? Snowflake Open Catalog is a hosted, managed deployment of Apache Polaris. The underlying software is the same — Polaris is the open-source catalog Snowflake donated to the Apache Software Foundation in 2024. If you want Polaris semantics without operating a service, use Open Catalog. If you want full control or need to run inside a particular network boundary, deploy Polaris yourself. Catalog tables and configuration are portable between the two.

Do Iceberg tables support masking policies and row access policies? Snowflake-managed Iceberg tables support both, identically to FDN tables. The policy evaluates at query time inside Snowflake’s planner, so the same CREATE MASKING POLICY ... ALTER ICEBERG TABLE ... MODIFY COLUMN ... SET MASKING POLICY ... workflow works. Externally-managed tables can have policies attached, but they only apply when reading from Snowflake — other engines reading the same files through the catalog see the unprotected data. For sensitive columns, either keep the table Snowflake-managed or tokenize the column upstream.

Can I migrate a Snowflake-managed Iceberg table to externally-managed (or vice versa)? There’s no in-place conversion. The two flavors are different at the catalog level — Snowflake-managed tables live in Snowflake’s catalog and externally-managed tables live in an external catalog. To switch, create a new table of the target flavor, copy the data, and swap names. The data files can stay in the same S3 location if both volumes point to it, which makes the swap mostly a metadata operation, but the catalog entries are recreated.

How much extra does Iceberg cost compared to FDN? Compute is the same: the warehouse runs the same query plan against either format. Storage is similar for the table data itself (both use Parquet-style columnar layouts), but you pay the storage in your own S3 bucket for Iceberg rather than as a line item on the Snowflake bill — which often nets out cheaper because S3 list prices are competitive. The extra cost on Iceberg is the metadata: every snapshot keeps its referenced files alive, so long retention periods cost more storage than they would on FDN. Set DATA_RETENTION_TIME_IN_DAYS deliberately and run snapshot expiration.

Can I use Snowflake’s MERGE and DELETE on Iceberg tables? On Snowflake-managed Iceberg tables, yes — full DML including MERGE, UPDATE, DELETE works the same as on FDN tables. Internally these compile to Iceberg’s merge-on-read operations, which write delete files alongside the data files rather than rewriting full data files on every change. On externally-managed tables, DML support depends on the catalog: Glue catalog integrations support INSERT; full MERGE/UPDATE/DELETE on external tables is more limited and depends on whether the source catalog is configured for it.

What happens if Snowflake and Spark write to the same Iceberg table concurrently? For externally-managed tables, Iceberg’s optimistic concurrency model handles this: each writer commits a new snapshot referencing the previous snapshot, and the catalog rejects commits that don’t point at the current head. The losing writer retries. This works fine for low-contention workloads but degrades under heavy concurrent writes — the right pattern is to designate one writer per table or partition, not many. For Snowflake-managed tables, Snowflake is the only writer by definition, so concurrent-writer conflicts don’t apply.

Does Iceberg replace dbt-on-Snowflake? No — dbt is a transformation orchestrator and Iceberg is a table format. They compose well: dbt models can target Iceberg tables using the same +materialized: table config, and the dbt-snowflake adapter supports Iceberg natively as of 2024. The interesting pattern is using Iceberg to share dbt-built marts with external consumers (Spark, Trino) without copying the data out of Snowflake.

Wrapping up

Iceberg tables on Snowflake are the right default for any team that wants multi-engine access to warehouse data, or that wants warehouse data to land in their own S3 bucket instead of inside Snowflake’s internal stage. Snowflake-managed tables are the easier on-ramp — they give you nearly all of Snowflake’s feature set plus open-format storage with one DDL change. Externally-managed tables are the right call when another system already owns the data and writes are happening outside Snowflake.

The pieces that matter operationally are the external volume IAM setup, the catalog choice (Snowflake / Glue / Polaris depending on your existing stack), and a maintenance schedule that runs compaction, snapshot expiration, and orphan file cleanup. Get those right and the day-to-day experience is indistinguishable from native FDN with the upside of full open-format portability.

If you’re going further on Snowflake performance, the Snowflake clustering keys guide covers when clustering pays off on either format, the Snowflake query profile guide helps you read what queries are actually doing, and the Snowflake warehouse sizing guide covers the cost math that applies equally to Iceberg-table reads. If you’re layering AI on top — using these tables as the source for text-to-SQL — our guide to Snowflake Cortex Analyst covers the semantic model and the integration with Cortex Search for high-cardinality dimensions. And if your day-to-day workflow involves a lot of snapshot inspection, schema evolution, and cross-engine debugging, QueryPlane lets you build dashboards directly against the system tables that expose Iceberg metadata — including TABLE_STORAGE_METRICS, ICEBERG_STORAGE_OPTIMIZATION_HISTORY, and table-level partition statistics — without writing one-off scripts.