Snowflake Warehouse Sizing in Practice

Q: How do I stop a runaway warehouse from blowing the budget?

Attach a resource monitor with explicit SUSPEND and SUSPEND_IMMEDIATE triggers. SUSPEND at 100% lets running queries finish; SUSPEND_IMMEDIATE at 110% is the hard cap. Account-level monitors fire after warehouse-level ones, so attach a per-warehouse monitor for normal enforcement and a global account monitor as a safety net. NOTIFY triggers only send email — they do not stop anything.

How to pick Snowflake warehouse size: credits per hour, auto-suspend, multi-cluster scale-out, and the sizing mistakes that quietly burn budget.

This post was written by an engineer at QueryPlane. QueryPlane is an app builder for your database: bring your own postgres db and you can create interactive applications to share with other developers, coworkers or even your customers. If you’re interested in trying it out, get started here.

Warehouse sizing is the single most consequential cost decision in Snowflake. Every other knob — auto-suspend timeout, multi-cluster min and max, the difference between Standard and Snowpark-optimized — exists to compensate for the fact that compute is billed by the second per warehouse, doubles in cost at every step up, and runs whether or not the queries on it are actually using the cores.

Most teams over-size at the start because the cost difference looks rounding-error small in a sandbox. It is not. A Large running 24/7 is 8x the bill of an X-Small running 24/7, and the gap from “we picked Medium because it felt safe” to “we right-sized to Small with multi-cluster scale-out” is routinely 40-60% of warehouse spend. That is the territory this post is about.

In this post, we’ll cover:

The warehouse sizes - all ten tiers and what they actually cost per hour
Scale-up vs scale-out - when bigger helps and when more clusters help instead
Auto-suspend and auto-resume - the cheapest knob, often the most misconfigured
Multi-cluster warehouses - how MIN_CLUSTER_COUNT and MAX_CLUSTER_COUNT actually interact
Resource monitors - the only way to enforce a budget
Snowpark-optimized warehouses - when the memory-heavy SKU is worth the premium
A sizing methodology - the order to test sizes in, and what to look at in QUERY_HISTORY

The warehouse sizes and what they cost

Snowflake virtual warehouses come in ten sizes. Each step up doubles the number of compute clusters underneath and doubles the credit cost per hour.

Size	Credits / hour	Relative cost	Typical use
X-Small	1	1x	Default for dev, BI tools, light queries
Small	2	2x	Most ad-hoc analyst queries, small dashboards
Medium	4	4x	Regular ELT loads, mid-sized aggregations
Large	8	8x	Heavier transforms, large scans
X-Large	16	16x	Big batch jobs, wide joins, end-of-day rollups
2X-Large	32	32x	TB-scale aggregations
3X-Large	64	64x	Multi-TB joins, large data science training reads
4X-Large	128	128x	Heavy reporting against very wide tables
5X-Large	256	256x	Petabyte-scale scans
6X-Large	512	512x	Niche workloads; rarely the right answer

A credit’s dollar price depends on edition and region — Standard runs around $2/credit on AWS US East, Enterprise around $3, Business Critical around $4 — but the relative shape is what matters: the cost curve is exactly geometric, so sizing decisions compound dramatically.

The crucial property of this curve is that doubling the size halves the runtime only when the query can actually use twice as many cores. Many real queries cannot. A SELECT ... LIMIT 10 against a small table will take roughly the same wall time on an X-Small as on a 4X-Large; you will just pay 128x more.

When scale-up actually helps

The thing that doubles when you go from Small to Medium is the number of compute servers (and therefore the number of cores and the amount of local SSD cache) backing the warehouse. The query optimizer parallelizes most operators across servers, so workloads that benefit are the ones with enough work to spread.

Scale-up is genuinely effective for:

Large table scans where the bottleneck is reading partitions in parallel.
Wide joins between large tables, where each server can handle a partition of the build side.
Window functions and large aggregations where partial aggregation parallelizes cleanly.
Queries that spill to remote disk on a smaller warehouse — going up a size often eliminates the spill and improves wall time more than 2x.

Scale-up is not effective for:

Small queries that finish in under a second on the smaller size. Doubling the size doubles the cost and changes nothing.
Queries gated on a serial step — single-threaded UDFs, certain JavaScript stored procedures, narrow filters that the optimizer can’t parallelize past one server.
Highly concurrent workloads where the warehouse is already fully utilized but each individual query is small. That is a scale-out problem, covered below.

The cheapest test for “does my query benefit from scale-up?” is to run it on the next size up and look at EXECUTION_TIME in QUERY_HISTORY. If wall time roughly halves, the doubled cost is a wash. If it drops by less than 30%, you are paying for empty cores.

Scale-out: multi-cluster warehouses

The other axis is concurrency. A single warehouse — regardless of size — has a MAX_CONCURRENCY_LEVEL parameter, default 8. When that many queries are running simultaneously, additional queries are queued. Going up a size does not fix this; the cap is per cluster, and a Large is still one cluster.

For concurrency you want a multi-cluster warehouse (Enterprise edition and above), which spins additional clusters of the same size up and down based on queueing:

ALTER WAREHOUSE bi_wh SET
  WAREHOUSE_SIZE = 'SMALL'
  MIN_CLUSTER_COUNT = 1
  MAX_CLUSTER_COUNT = 5
  SCALING_POLICY = 'STANDARD';

STANDARD adds clusters aggressively when queries queue and shuts them down conservatively. ECONOMY is the inverse — it lets queries queue for ~6 minutes’ worth of estimated work before adding a cluster, which trades latency for cost. The right choice depends on whether the workload is latency-sensitive (BI dashboards: STANDARD) or batch (overnight loads: ECONOMY).

The mental model is: pick the smallest size that finishes any single query in acceptable wall time, then use multi-cluster to handle concurrency. A SMALL with MAX_CLUSTER_COUNT = 4 is cheaper at most concurrency levels than a MEDIUM with one cluster, and it scales gracefully when traffic spikes.

Auto-suspend and auto-resume

Auto-suspend is the most overlooked cost lever in Snowflake. A warehouse continues to bill credits while idle until it suspends, and per-second billing has a 60-second minimum every time it resumes. The two parameters that matter:

ALTER WAREHOUSE bi_wh SET
  AUTO_SUSPEND = 60          -- seconds idle before suspend
  AUTO_RESUME = TRUE;

The default AUTO_SUSPEND is 600 seconds (10 minutes). That default exists because a fresh resume rebuilds the local SSD cache, and a hot cache can cut query wall time meaningfully on repeated scans of the same partitions. Lower auto-suspend trades cache warmth for idle cost.

The right value is workload-dependent:

BI dashboards with bursty viewers: 60-120 seconds. Most viewer sessions cluster into bursts; the cache rarely survives across sessions anyway.
Always-on operational tools: 60 seconds. The first query after idle pays the resume cost; subsequent queries reuse cache.
Hourly batch ELT: 60 seconds — the warehouse is supposed to be off between runs.
High-frequency repeated queries against the same tables: 300-600 seconds, because cache hits are real value.

The 60-second minimum billing per resume means that if your warehouse oscillates suspend/resume every 30 seconds because of staggered queries, you are paying 60 seconds for 30 seconds of work. Look at WAREHOUSE_METERING_HISTORY and divide credits by query count to spot this pattern. If the answer is “we’re paying 1 credit per ~10 seconds of query work,” the auto-suspend is set too low for the traffic shape.

Resource monitors are how you cap spend

Auto-suspend bounds idle cost. Resource monitors bound active cost. Without one, a runaway query, a poorly-sized warehouse, or a programmatic loop kicking off jobs can burn through a month’s budget in hours, and Snowflake will not stop it.

CREATE RESOURCE MONITOR daily_etl_monitor
  WITH
    CREDIT_QUOTA = 1000
    FREQUENCY = MONTHLY
    START_TIMESTAMP = IMMEDIATELY
    TRIGGERS
      ON 75 PERCENT DO NOTIFY
      ON 90 PERCENT DO NOTIFY
      ON 100 PERCENT DO SUSPEND
      ON 110 PERCENT DO SUSPEND_IMMEDIATE;

ALTER WAREHOUSE etl_wh SET RESOURCE_MONITOR = daily_etl_monitor;

Two things people miss:

SUSPEND lets running queries finish, then suspends. SUSPEND_IMMEDIATE cancels them. Most teams want both triggers — SUSPEND at 100% to stop the bleeding, SUSPEND_IMMEDIATE at 110% as a hard backstop.
Account-level monitors only fire after warehouse-level monitors. If you only have an account monitor and one warehouse runs away, the others keep working until the account total is hit. For most setups, attach a per-warehouse monitor and a global account-level safety monitor.

NOTIFY does nothing for users who haven’t enabled email notifications in their preferences, and it doesn’t pause anything. Use it for early warning, not enforcement.

See what QueryPlane can build for you

Connect to your database, write SQL with AI, and build shareable apps — all from your browser.

Get Started Book a Demo

Snowpark-optimized warehouses

The standard warehouse SKUs are tuned for SQL workloads — the ratio of memory to compute is what most analytic queries want. Snowpark-optimized warehouses are a separate SKU with 16x the memory per node, intended for memory-bound work: ML training, large UDFs, Iceberg writes that materialize wide intermediate state.

Two things to know before reaching for them:

They start at Medium (no X-Small or Small Snowpark-optimized) and cost roughly 1.5x the equivalent standard size in credits per hour, so a Medium Snowpark-optimized runs at 6 credits/hour vs 4 for a standard Medium.
For pure SQL workloads, a standard X-Large is almost always cheaper and faster than a Snowpark-optimized Medium, because the SQL optimizer can use the additional compute. Only switch to Snowpark-optimized when you have a measured memory pressure problem (queries spilling heavily, UDFs OOM-ing, ML training that doesn’t fit).

If you are not sure whether you have a memory problem, check BYTES_SPILLED_TO_LOCAL_STORAGE and BYTES_SPILLED_TO_REMOTE_STORAGE in QUERY_HISTORY. Remote spill is a strong signal that the warehouse is undersized for memory. Local spill is fine; remote spill kills wall time.

A sizing methodology that actually works

The cheapest way to right-size is binary search down, not up.

Start at X-Small. Run the workload. Most analyst queries against small fact tables (under ~10M rows) finish on X-Small in seconds.
Move up only when you have evidence. Look at QUEUED_OVERLOAD_TIME (queue time), BYTES_SPILLED_TO_REMOTE_STORAGE (memory pressure), and EXECUTION_TIME (wall time). If the workload spills remotely or wall time is unacceptable, and the query is parallelizable, go up one size.
Re-measure. If wall time dropped by less than 30% on the larger size, the larger size is wasteful — go back down.
For concurrency, add clusters, not size. If the issue is QUEUED_PROVISIONING_TIME or queries waiting on MAX_CONCURRENCY_LEVEL, the answer is multi-cluster, not bigger.
Separate workloads onto separate warehouses. ELT jobs, BI dashboards, and ad-hoc analyst queries have different latency profiles and different right sizes. One warehouse for everything is almost always wrong; the cost of a second small warehouse is far less than the cost of mis-sizing the first one.

The query you’ll run more than any other:

SELECT
  warehouse_name,
  query_type,
  COUNT(*) AS query_count,
  AVG(execution_time) / 1000 AS avg_exec_seconds,
  AVG(queued_overload_time) / 1000 AS avg_queue_seconds,
  SUM(bytes_spilled_to_remote_storage) AS total_remote_spill,
  AVG(credits_used_cloud_services) AS avg_cs_credits
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
GROUP BY 1, 2
ORDER BY query_count DESC;

That single query tells you which warehouses are queueing (oversized for concurrency, undersized for the workload), which are spilling (undersized for memory), and which are running mostly fast queries (probably oversized).

Pitfalls and gotchas

The 60-second minimum on resume is not a billing edge case. It is a real cost driver for warehouses that resume frequently. A warehouse that resumes 100 times a day to run 5-second queries spends 6,000 seconds (100 minutes) of billable time on 500 seconds of work — a 12x overhead. Either lengthen AUTO_SUSPEND or batch the work so resumes are less frequent.

MIN_CLUSTER_COUNT > 1 keeps that many clusters running 24/7. It is not a “minimum during business hours” setting. If you set MIN_CLUSTER_COUNT = 2 for a busy daytime workload, you pay for two idle clusters all night.

A suspended warehouse still costs money during cluster auto-resize. If MAX_CLUSTER_COUNT is high and traffic spikes, the additional clusters take ~1-2 seconds to provision and then bill at full rate. There is no “warm pool” — you are paying for the size you allow it to grow to, not the size it currently is.

Cloud Services credits are billed separately and capped at 10% of compute credits per day. Most queries’ Cloud Services overhead is free because of that 10% cap. Tiny queries that hit cached results — SELECT 1 style health checks, dashboard polling — burn cloud services credits without any compute. If credits_used_cloud_services is a meaningful fraction of credits_used_compute for a warehouse, the workload is dominated by metadata-only queries and you should question whether they need to run at all.

Standard scaling policy is wrong for batch. STANDARD adds clusters as soon as queries queue, which means a batch that spawns 50 queries simultaneously will spin up MAX_CLUSTER_COUNT clusters within seconds, all of which are billed at the new size’s rate. For batch ELT, set SCALING_POLICY = 'ECONOMY' so the system absorbs the spike on existing clusters and only scales out when queueing actually persists.

Don’t size warehouses for the worst case. Snowflake’s per-second billing is built for variable sizing. The pattern of “right-size for normal load, scale up briefly for the end-of-day rollup, scale back down” — done by issuing ALTER WAREHOUSE ... SET WAREHOUSE_SIZE = ... from the orchestrator — costs less than running the larger size all day, and it’s how Snowflake expects you to use the system.

Frequently asked questions

What size Snowflake warehouse should I start with? X-Small. The default for new warehouses, the cheapest size, and big enough for most ad-hoc analyst queries against tables under ~10M rows. Moving up should always be in response to measured evidence — wall time, queueing, or remote spill — not anticipation. Doubling the size doubles the cost; do it once you have data showing the larger size halves wall time on the queries that matter.

Does a bigger Snowflake warehouse make every query faster? No. A bigger warehouse adds parallel compute resources, which only helps queries that can use them. Small queries against small tables don’t get faster on a Large warehouse — they take roughly the same time and cost 8x more. Look at EXECUTION_TIME in QUERY_HISTORY before and after a size change; if wall time dropped by less than 30%, the larger size is paying for empty cores.

What’s the difference between a multi-cluster warehouse and a bigger warehouse? A bigger warehouse (scale-up) gives each query more compute to run faster. A multi-cluster warehouse (scale-out) adds more clusters of the same size to handle more concurrent queries. Use scale-up when individual queries are slow because of large scans or wide joins. Use scale-out when a single warehouse hits its MAX_CONCURRENCY_LEVEL and queries start queueing. Most BI workloads benefit more from multi-cluster than from a bigger size.

How does Snowflake auto-suspend billing actually work? A warehouse bills credits per second while running, suspends after the AUTO_SUSPEND idle period, and stops billing entirely while suspended. There’s a 60-second minimum charge each time it resumes. So a warehouse that resumes 100 times a day for 5-second queries pays for 100 minutes of compute on 8 minutes of actual work. Lengthen AUTO_SUSPEND if you see frequent short resumes, or batch the work so the warehouse stays warm.

Can I have one warehouse for everything? You can, but it is almost always cheaper to split workloads. A warehouse sized for end-of-day rollups is wasteful for analyst queries the rest of the day. A warehouse sized for analysts is too small for the rollups. Two right-sized warehouses cost less than one mis-sized warehouse, and they isolate concurrency so a heavy ELT job doesn’t queue dashboard queries.

What’s the difference between a Snowpark-optimized warehouse and a standard one? Snowpark-optimized warehouses have 16x the memory per compute node and are intended for memory-bound workloads — ML training, large UDFs, Iceberg writes with wide intermediate state. They start at Medium, no X-Small or Small option, and cost roughly 1.5x the standard equivalent. For pure SQL, a standard X-Large is almost always cheaper than a Snowpark-optimized Medium. Only switch when you see significant remote spill or UDFs running out of memory.

How do I stop a runaway warehouse from blowing the budget? Attach a resource monitor with explicit SUSPEND and SUSPEND_IMMEDIATE triggers. SUSPEND at 100% lets running queries finish; SUSPEND_IMMEDIATE at 110% is the hard cap. Account-level monitors fire after warehouse-level ones, so attach a per-warehouse monitor for normal enforcement and a global account monitor as a safety net. NOTIFY triggers only send email — they do not stop anything.

What does MIN_CLUSTER_COUNT actually do? It keeps that many clusters running 24/7, regardless of traffic. It is not a “minimum during business hours” knob. Setting MIN_CLUSTER_COUNT = 2 means you pay for two clusters every minute of every day. Use MIN_CLUSTER_COUNT = 1 and rely on auto-resume + multi-cluster scale-out unless you have a measured cold-start latency problem on the first query.

Wrapping up

Snowflake warehouse sizing is one of the rare areas where the right answer is almost always smaller than your instinct says. Start at X-Small, move up only when measured evidence demands it, separate workloads onto purpose-sized warehouses, and lean on auto-suspend, multi-cluster scale-out, and resource monitors to absorb variability without paying for it 24/7.

The metrics in QUERY_HISTORY and WAREHOUSE_METERING_HISTORY are the source of truth — EXECUTION_TIME, QUEUED_OVERLOAD_TIME, BYTES_SPILLED_TO_REMOTE_STORAGE, and credits per query are the four numbers that tell you whether the warehouse is too big, too small, or just right. Size to the data, not the hypothesis.

If you’re spending real time inside Snowflake every day, our guides to Snowflake dynamic tables, streams and tasks, clustering keys, the search optimization service, and stages, COPY INTO, and Snowpipe cover the other levers that compound with right-sizing — declarative pipelines, fine-grained CDC, partition pruning, point-lookup acceleration, and the ingestion shapes that drive how much warehouse you actually need. And if you want a fast SQL editor for exploring QUERY_HISTORY and prototyping warehouse changes, the QueryPlane Snowflake integration connects in a few minutes.