Menu
Blog Documentation Community Pricing Demo Call Sign Up
Sign Up

Reading the Snowflake Query Profile in Practice

How to read the Snowflake query profile: operator nodes, pruning ratios, spilled bytes, exploding joins, and the most common bottlenecks that show up before you reach for more warehouse.

Snowflake

This post was written by an engineer at QueryPlane. QueryPlane is an app builder for your database: bring your own postgres db and you can create interactive applications to share with other developers, coworkers or even your customers. If you’re interested in trying it out, get started here.


Almost every Snowflake performance problem — the slow dashboard, the expensive nightly ELT, the query that fails with “out of memory” — gets solved by reading the query profile. It is the only place Snowflake tells you, at operator-level granularity, how a query actually ran: which scans pruned, which joins exploded, which step spilled to remote storage, and where the wall-clock time really went. The cost-attribution tools above it (QUERY_HISTORY, ACCOUNT_USAGE, warehouse metering) tell you which query is expensive. The profile tells you why.

The catch is that the profile is information-dense and the layout has changed at least twice in the past two years. Engineers who can read it solve performance issues in minutes. Engineers who can’t end up resizing the warehouse, which sometimes helps and usually wastes credits. This post walks through how to actually read it.

In this post, we’ll cover:

  • Finding queries - using QUERY_HISTORY and the Activity page to land on the right run
  • The profile layout - operator tree, statistics panel, and the most-expensive-nodes summary
  • The operators that matter - TableScan, Join, Aggregate, Sort, Window, and what their costs mean
  • Pruning - the single most important number in the profile, and how to make it go up
  • Spillage - local vs remote spill, and what each tells you about warehouse sizing
  • Query tags - tagging workloads so you can attribute costs in QUERY_HISTORY later
  • A tuning methodology - the order to investigate in, from QUERY_HISTORY down to a single operator

Finding the right query

Before you can read a profile you have to find the right query. There are three entry points.

The Query History page under Activity in Snowsight is the most convenient for ad-hoc investigations. Filter by warehouse, user, status, or a substring of the SQL text, click into any row, and the profile is one tab over. It only shows your account’s recent queries — useful for “the query I ran two minutes ago”, not for trend analysis.

The SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY view is the source of truth for everything older than 7 days, with up to a 45-minute latency and a 365-day retention window. This is what you query when you want to find the most expensive queries, the slowest ones, or the ones that spilled in the last 30 days. The columns most teams ignore are the ones most worth knowing:

SELECT
  query_id,
  user_name,
  warehouse_name,
  warehouse_size,
  total_elapsed_time / 1000      AS wall_seconds,
  execution_time / 1000          AS exec_seconds,
  compilation_time / 1000        AS compile_seconds,
  queued_overload_time / 1000    AS queued_seconds,
  bytes_scanned,
  partitions_scanned,
  partitions_total,
  bytes_spilled_to_local_storage,
  bytes_spilled_to_remote_storage,
  credits_used_cloud_services,
  query_tag,
  LEFT(query_text, 120)          AS sql_preview
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
  AND warehouse_name = 'ANALYTICS_WH'
  AND total_elapsed_time > 30000
ORDER BY total_elapsed_time DESC
LIMIT 50;

That query is the workhorse of Snowflake performance tuning. Three columns in particular drive most decisions: partitions_scanned / partitions_total (pruning effectiveness), bytes_spilled_to_remote_storage (memory pressure), and queued_overload_time (concurrency pressure). We’ll come back to each.

The third entry point is GET_QUERY_OPERATOR_STATS, a table function that returns the operator tree of a query as a queryable result set. This is what you use when you want to programmatically detect bad patterns — for example, finding every query in the last week where a join produced more rows than both inputs combined.

What the profile actually shows

The profile is three things laid out on one screen.

The operator tree on the left is a directed acyclic graph of physical operators — TableScan, Filter, Join, Aggregate, Sort, Window, Result, and so on. Each node shows its percentage of total execution time and, on hover, its input and output row counts. The tree reads bottom-up: leaf scans feed joins, joins feed aggregates, the final result feeds Result. Time spent in a node includes its children, so the percentages don’t sum to 100 — you read them as “which subtree dominates”.

The statistics panel on the right summarizes the query: bytes scanned, rows produced, partitions scanned vs total, bytes spilled, and the breakdown of execution time across processing, IO, network, and the synchronization overhead Snowflake calls “Initialization”. For most queries the time is dominated by Processing or IO; large network or initialization time is a sign of a non-obvious problem.

The “Most Expensive Nodes” callout at the top right is the single thing most engineers look at. It surfaces the top three operators by exec time, which is usually enough to start a hypothesis. If the most expensive node is a TableScan, the problem is data volume or pruning. If it is a Join, the problem is cardinality or join order. If it is a Sort or Window, the problem is almost always memory and ordering. Each implies a different fix.

The thing the profile does not show by default is per-step bytes processed; you have to hover the node to see “Bytes scanned” and “Bytes written”. Get used to hovering — most of the actionable data lives in the tooltips.

The operators that matter

You can read a profile well by knowing five operators deeply.

TableScan is the leaf node of every read. The two numbers that matter on a TableScan are partitions scanned vs partitions total (lower ratio = better pruning) and bytes scanned. A scan that reads 100% of partitions on a table with a clustering key configured means the predicate didn’t match the clustering. A scan that reads 1% of partitions on a 10 TB table is doing exactly what micro-partitions are for. If a TableScan is the most-expensive node and the partition ratio is high, the fix is almost always a better predicate or a clustering key change, not a bigger warehouse.

Join is the operator most likely to surprise you. Snowflake’s profile shows the join algorithm — usually HASH JOIN, occasionally MERGE JOIN — and the input row counts on each side. The classic pathology is the cardinality blow-up: a join that takes 10M rows on the build side and 1M rows on the probe side and produces 800M rows on the output. That is a sign of a bad join key (duplicates) or a missing predicate. The profile makes it obvious because the output row count on the join node is visible on hover, but only if you check.

Aggregate comes in two flavors: streaming (HashAggregate for GROUP BY) and global (a final reduction across compute servers). The streaming variant spills when the hash table doesn’t fit in memory. Look at Bytes spilled to local storage on the node — if it is nonzero, the aggregate is memory-bound, which usually means a wider grouping key than the table can hold. The fix is either a smaller grouping (do you really need GROUP BY user_id, session_id, event_ts?) or a bigger warehouse.

Sort is the operator that most reliably leads to remote spill. Sorting is O(n log n) in memory if the data fits and O(n^2 / memory) when it doesn’t, because Snowflake has to write sort runs to remote SSD and merge them. A Sort operator with remote spillage is the single strongest signal that the warehouse is undersized for the workload — or that the sort doesn’t need to exist. ORDER BY in a CTE that the optimizer can’t prove is unnecessary will run a sort over the full dataset; removing it is often a 10x speedup.

Window functions internally sort and partition. The same memory pressure that hurts Sort hurts Window, plus you pay for the partition repartitioning step (WindowFunction shows Window partition by ... on hover). Anti-pattern: a window function over PARTITION BY user_id on a 2B-row event table running on a Small warehouse. Memory pressure is guaranteed; the warehouse is the wrong size for that workload regardless of how clever the query is.

Two other nodes worth recognizing even if they’re rare to optimize. InternalObject and ExternalFunction show up when a query calls a UDF or external function; UDF time is usually opaque to the profile and is best diagnosed by re-running the UDF body inline. Result is the final node and almost never an optimization target — if Result shows meaningful exec time it usually means the client (BI tool, JDBC driver, Python script) is slow to receive rows, not Snowflake.

Pruning is the single most important number

If you read one number in the profile, read the TableScan ratio of partitions scanned to partitions total. Snowflake stores tables as immutable micro-partitions of around 16 MB compressed, and the optimizer skips any partition whose min/max metadata for the filtered columns can’t match the predicate. Good pruning means scanning 1% of a 10 TB table. Bad pruning means scanning all 10 TB and then filtering.

The lever for pruning is the clustering of the table. New tables are clustered by load order, which is fine for time-series workloads where most queries filter by event_ts and rows are loaded roughly in order — pruning is almost free. It is not fine for tables that get filtered on a different column than the load order, in which case you set a clustering key and pay for an automatic reclustering service to maintain it.

A clustering key is the right answer when three things are true. First, the table is large — typically over a few hundred GB — because reclustering has a meaningful credit cost and isn’t justified on small tables. Second, queries consistently filter on the same column or small set of columns. Third, the load order doesn’t already correlate with that column. If you’re filtering by customer_id on a 5 TB events table and rows are loaded by time, clustering on customer_id will move pruning from 100% of partitions to single-digit percent, and the query speedup typically pays for the reclustering credits within days.

If you want the same effect without paying for reclustering on a table you query infrequently, the search optimization service does point-lookup acceleration through a separate index structure rather than physical reordering. We’ve covered when each makes sense in that guide.

See what QueryPlane can build for you

Connect to your database, write SQL with AI, and build shareable apps — all from your browser.

Spillage tells you about warehouse sizing

Two columns in QUERY_HISTORY capture memory pressure: bytes_spilled_to_local_storage and bytes_spilled_to_remote_storage. Both also appear on the relevant operator in the profile (typically Sort, Aggregate, or Window).

Local spill is mild and roughly free. When an operator runs out of warehouse memory, Snowflake writes to the local SSDs attached to the compute servers. SSDs are fast enough that local spill on a few hundred MB barely registers in wall time. You will see local spill on plenty of working queries; ignore it unless it correlates with slow runs.

Remote spill is the warning sign. When local SSD fills, Snowflake writes to remote storage (S3, GCS, Azure Blob), which is one to two orders of magnitude slower. A query with significant BYTES_SPILLED_TO_REMOTE_STORAGE is the textbook case for scale-up: the next warehouse size has roughly double the per-server memory and will often eliminate the spill entirely, dropping wall time by 5-20x. The credit math almost always works out — the larger warehouse costs 2x per hour but runs the query in a tenth of the time.

A query that spills remotely on every size up to 4X-Large is a different problem. That is a query-shape problem (SELECT * ORDER BY ... on a 1 TB table with no LIMIT, an over-broad GROUP BY, a Cartesian join you didn’t realize was Cartesian) and the answer is to fix the SQL, not buy bigger compute. The profile will usually show the offending operator clearly — the spill happens on a single node, and that node’s input row count or grouping cardinality is the giveaway.

Query tags for cost attribution

By default, QUERY_HISTORY tells you which user ran a query and on which warehouse. That is rarely enough to attribute cost to a team, a pipeline, or a customer. The QUERY_TAG session parameter is the lightweight way to add the missing dimension:

ALTER SESSION SET QUERY_TAG = 'dbt:nightly:orders_fact';
-- queries from this session land in QUERY_HISTORY with the tag

Tags propagate to every query executed in the session and show up in the QUERY_TAG column of QUERY_HISTORY. The best practice is to set tags at the boundaries — once per dbt run, per Airflow task, per Sigma dashboard refresh — using a structured format like tool:pipeline:step so you can group and filter on substring matches.

Once tags are flowing, the cost report writes itself:

SELECT
  SPLIT_PART(query_tag, ':', 2)    AS pipeline,
  COUNT(*)                          AS query_count,
  SUM(total_elapsed_time) / 1000    AS total_seconds,
  SUM(credits_used_cloud_services)  AS cloud_services_credits
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
  AND query_tag ILIKE 'dbt:%'
GROUP BY 1
ORDER BY total_seconds DESC;

The OBJECT_CONSTRUCT variant of QUERY_TAG works too — set the tag to a JSON object and parse it out later with PARSE_JSON. Tags are free to set and cost nothing to query; the value comes from the discipline of always setting them.

A tuning methodology

Performance tuning a Snowflake workload follows a fixed order. Skipping steps wastes credits.

Step 1: start with QUERY_HISTORY, not the profile. You can’t tune a query you can’t find. Run the workhorse query above with a WHERE clause matching your problem — a date range, a warehouse, a user, a tag — and sort by total_elapsed_time or bytes_scanned depending on whether the problem is wall time or cost. Get a single query_id to investigate.

Step 2: open the profile. On the query detail page, the Profile tab loads the operator tree. Glance at “Most Expensive Nodes” first; that gets you a hypothesis in five seconds. Then look at the statistics panel: pruning ratio, bytes scanned, bytes spilled. Three numbers, three hypotheses.

Step 3: investigate the suspect node. Click into it. Look at input rows, output rows, bytes scanned, bytes spilled, partitions scanned. The combination usually tells the story. Output-much-larger-than-inputs on a Join is a cardinality blow-up. Partitions-scanned-equal-to-total on a TableScan is a pruning miss. Remote-spill on a Sort is a memory miss.

Step 4: fix the SQL or the table layout first, the warehouse last. A schema or query change costs nothing to deploy and the benefit compounds across every future run. A warehouse resize is reversible but bills continuously. Increase the size when measured evidence shows the smaller size spills remotely, not because the query “feels slow”.

Step 5: re-measure. Run the query again. Confirm EXECUTION_TIME dropped meaningfully, confirm spill is gone, confirm pruning improved. If it didn’t, your hypothesis was wrong — go back to step 2 on the new profile.

This is also the loop for cost work. Slow queries and expensive queries have most of the same causes; the only difference is whether the symptom is wall time or credits. Both are visible in QUERY_HISTORY and both drill into the same profile.

Common patterns and what they mean

A handful of profile shapes recur often enough to recognize.

The Cartesian explosion is a join whose output row count is the product of its inputs. The profile shows it instantly: a Join node with millions of input rows on both sides and tens of billions on the output. The cause is usually a missing predicate (an unintended CROSS JOIN) or a join key with duplicates on both sides. The fix is to add the missing predicate or deduplicate one side first.

The all-partitions scan is a TableScan showing partitions_scanned = partitions_total on a large table. The query is filtering on a column the table isn’t clustered on. Either add a clustering key or rewrite the query to filter on a column that prunes (often a date filter on a time-clustered table).

The remote-spill sort is a Sort or Window node with BYTES_SPILLED_TO_REMOTE_STORAGE in the hundreds of MB or higher. Either size up the warehouse, eliminate the sort if it’s incidental, or split the work into smaller batches with explicit WHERE predicates.

The over-broad GROUP BY is an Aggregate node where input row count and output row count are roughly equal. A GROUP BY that doesn’t reduce cardinality is paying for hashing without getting any aggregation benefit. The usual cause is grouping by a column that’s effectively a primary key already; remove it from the GROUP BY and the aggregate vanishes.

The compile-bound query is a query where compilation_time is significant relative to execution_time — for example, 5 seconds to compile a query that runs in 2 seconds. This happens with very complex SQL (deep nested CTEs, many joins) or stale result caches. The fix is usually to simplify the SQL or to materialize intermediate steps in dynamic tables so the optimizer has less to work with at runtime.

The cloud-services-heavy workload is a warehouse where credits_used_cloud_services is approaching 10% of credits_used_compute (above which Snowflake charges for cloud services). This is a sign of metadata-heavy workload — health-check queries, dashboard polling, SELECT 1 style queries — and the answer is usually to consolidate or eliminate the polling rather than tune any individual query.

Frequently asked questions

Where do I find the query profile in Snowflake? Open the Activity > Query History page in Snowsight, click any query row, then click the Query Profile tab. You can also get there from a SQL worksheet by clicking the query ID in the result panel after a run. The profile is available for any completed query whose retention window hasn’t expired (60 minutes for cached failure detail, 14 days for the in-account profile, 365 days for the metadata in ACCOUNT_USAGE).

What is the difference between INFORMATION_SCHEMA.QUERY_HISTORY and SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY? INFORMATION_SCHEMA.QUERY_HISTORY is a table function that returns up to 7 days of recent queries with no latency, scoped to the current user and warehouse by default. ACCOUNT_USAGE.QUERY_HISTORY is a view covering 365 days of history with up to 45-minute latency, scoped to the whole account. Use INFORMATION_SCHEMA for “what just happened on this warehouse”; use ACCOUNT_USAGE for trend analysis and cost reporting.

How do I tell if a Snowflake query is expensive or just slow? Wall time (total_elapsed_time) and credit cost are correlated but not identical. A 30-second query on an X-Small costs roughly 1/32 of a 30-second query on a 2X-Large; the wall time is the same but the credits are not. Sort by total_elapsed_time to find slow queries, by bytes_scanned to find cost-heavy reads, and by (execution_time / 1000) * <credits_per_hour_for_size> to approximate credit cost when you need a rough dollar number per query.

What does it mean when a query “spills to remote storage” in Snowflake? The operator (usually Sort, Aggregate, or Window) ran out of warehouse memory, filled the local SSD on each compute server, and started writing to remote object storage (S3/GCS/Azure Blob). Remote spill is 10-100x slower than local spill and is the strongest signal that the warehouse is too small for the workload. The fix is almost always to size up; the only exception is when the SQL itself is the problem — a sort over a 1 TB result, an over-broad GROUP BY, or a Cartesian join.

How do I improve partition pruning in Snowflake? Filter on the column the table is clustered on, and make sure that column has enough cardinality to be useful for pruning. If queries consistently filter on a column the table isn’t clustered on, set a clustering key on that column (or expression like DATE_TRUNC('day', event_ts)) and Snowflake’s automatic clustering service will maintain the order. For very large tables where you only do point lookups, the search optimization service is the alternative.

What is a Snowflake query tag and how do I use it? QUERY_TAG is a session parameter that attaches an arbitrary string to every query the session executes. Set it once per pipeline run (ALTER SESSION SET QUERY_TAG = 'dbt:nightly:orders_fact') and use it as a GROUP BY dimension in ACCOUNT_USAGE.QUERY_HISTORY for cost attribution. Tags are free, propagate automatically, and are the single highest-leverage piece of operational hygiene you can adopt on a multi-team Snowflake account.

Should I use EXPLAIN in Snowflake? EXPLAIN returns the logical plan before the query runs and is useful when you can’t or don’t want to execute the query — for example, to verify that a clustering key will be used by the optimizer before reorganizing a multi-TB table. The query profile after execution is more informative for tuning, because it shows actual row counts, bytes scanned, and partition pruning. Use EXPLAIN as a sanity check; use the profile to find real problems.

How do I track Snowflake query costs over time? Combine QUERY_HISTORY with WAREHOUSE_METERING_HISTORY and QUERY_TAG. QUERY_HISTORY gives you per-query attribution, WAREHOUSE_METERING_HISTORY gives you per-warehouse credit consumption, and tags give you the team or pipeline dimension. A daily report grouped by (warehouse_name, SPLIT_PART(query_tag, ':', 2)) over the last 30 days is the dashboard most data teams converge on; once you have it, surprises in the Snowflake bill stop happening.

Wrapping up

The query profile is the diagnostic instrument that connects Snowflake’s cost surface (QUERY_HISTORY, ACCOUNT_USAGE, warehouse credits) to its performance surface (operator trees, pruning, spill). Engineers who can read it diagnose slow and expensive queries in minutes and can tell the difference between “this query needs a clustering key” and “this warehouse is too small”. Engineers who can’t usually end up resizing the warehouse and hoping.

The shortlist of habits that pay back fastest: always set QUERY_TAG at pipeline boundaries, always check pruning before considering a warehouse resize, treat remote spill as the signal it is, and start every tuning session in QUERY_HISTORY rather than the profile so you investigate the right query. The rest is pattern recognition that comes with reading more profiles.

If you’re working in Snowflake every day, our guides to warehouse sizing, dynamic tables, streams and tasks, and the search optimization service cover the levers most often used in response to what the profile reveals. And if you want a fast SQL editor for prototyping QUERY_HISTORY reports and turning them into shared internal dashboards, the QueryPlane Snowflake integration connects in a few minutes.