BigQuery Query and Cost Optimization in Practice
How to tune BigQuery cost and performance: on-demand vs editions, reading the query plan, slot recommender, materialized views, scheduled queries, and the INFORMATION_SCHEMA queries that actually find waste.
BigQuery
This post was written by an engineer at QueryPlane. QueryPlane is an app builder for your database: bring your own postgres db and you can create interactive applications to share with other developers, coworkers or even your customers. If you’re interested in trying it out, get started here.
The thing that surprises most teams when they get serious about a BigQuery bill is how little of it is about query correctness. A query that returns the right answer can scan 4 TB instead of 40 GB, sit through ten minutes of shuffle, or trigger a hundred dollars of slot-time, and you will not notice from the result. The only signal is the bill at the end of the month — and by then the slow query is in a dashboard, a scheduled job, and a half-dozen downstream views, all happily firing it every morning.
This post is about the levers that actually move that bill. It is not a syntax reference: the SQL is straightforward, the pricing page is one search away, and Google’s docs are excellent. The hard part is choosing the right pricing model for your workload, learning to read the query plan without lying to yourself about what “slot time” means, knowing when a materialized view will pay for itself, and structuring the warehouse so the queries you cannot rewrite still do not blow your budget.
In this post, we’ll cover:
- On-demand vs editions — the pricing model decision and when each one wins
- How the query planner allocates work — slot-time, stages, and shuffled bytes
- Reading the query plan — what to look at first when a query is slow or expensive
INFORMATION_SCHEMA.JOBS— the one table that exposes the cost of everything- The slot recommender and autoscaler — when to use them and what they assume
- Materialized views and BI Engine — when pre-computation actually pays back
- Scheduled queries and resource sharing — keeping batch jobs out of interactive slots
- Partition pruning verification — proving the planner is doing what you think
- Common pitfalls — the patterns that quietly burn money
On-demand vs editions: pick the model that matches your workload
BigQuery has two pricing models for compute, and the choice between them sets the ceiling on every other decision you make. You can mix them at the project level, but most teams end up standardizing on one.
On-demand pricing charges you per bytes scanned — currently $6.25 per TiB after the 1 TiB monthly free tier (US multi-region). You get a shared pool of up to 2,000 slots for your queries, and you do not see them, manage them, or reserve them. The advantages are simplicity and a cost that scales perfectly with usage: if nobody runs queries today, today is free. The disadvantage is that the marginal cost of a poorly-written query is enormous — the same query that costs $0.02 against a 4 GB partition costs $25 against a 4 TB unpartitioned table — and you have no upper bound at the project level unless you manually configure custom quotas.
BigQuery editions (Standard / Enterprise / Enterprise Plus) charges per slot-hour, with reservations you provision and assign to workloads. You pick the edition (each is a feature tier, not a performance tier), pick a slot count, and the reservation autoscaler raises or lowers the slot allocation between your baseline and max settings as load demands. You can also commit to a 1-year or 3-year term for a 20%-40% discount.
The rule that holds up in practice: editions wins as soon as your steady-state slot usage exceeds about 100 baseline slots, because the per-slot-hour cost of a reservation is significantly cheaper than the equivalent bytes-scanned cost at that scale. Editions also wins when your bill is unpredictable from one month to the next — a reservation with a max ceiling cannot exceed that ceiling, so a runaway query that would have cost $400 on-demand will instead just queue. On-demand wins when usage is bursty and small (you genuinely have weeks with zero queries), or when you have a small number of high-value queries and want to amortize them across the free tier.
The mistake is staying on on-demand long after you should have switched. The signal is when the same dashboard query stops being free: if your team is hitting the 1 TiB free tier within the first week of the month, editions is going to be cheaper for you, full stop.
How the planner allocates work
Every BigQuery query you submit is compiled into a DAG of stages. A stage is a unit of parallel work — a scan, a filter, an aggregation, a join — and the planner picks how many workers to run inside it based on how many slots are available and how big the input looks. The total cost of a query, in editions, is the sum of slot-seconds across all stages. In on-demand, you pay for bytes scanned at the leaves, and the slots are bundled in.
A “slot” in BigQuery is the abstract unit of compute the planner divides work into; one slot is roughly one CPU’s worth of work for one second. A reservation with 500 slots can run 500 stage-workers concurrently, and a single complex query routinely uses hundreds of slots for a few seconds. The same query against the same data with twice the slots usually finishes faster — but only up to the point where the planner cannot parallelize the next stage any further. Past that point, extra slots sit idle and cost nothing in editions (because they are inside your reservation) but also do not help.
The three numbers worth memorizing for any query you care about:
- Slot-time consumed — total CPU-seconds across the query. This is the only proxy for “how much this query actually costs in editions” and the right number to optimize against. It is exposed in
INFORMATION_SCHEMA.JOBSastotal_slot_ms. - Bytes shuffled — the volume of data the planner had to move between workers during execution. Shuffled bytes are not free; large shuffles indicate skewed joins or aggregations that would benefit from clustering, repartitioning, or filtering earlier. Exposed as
total_bytes_processed(read) and the shuffle-related fields inJOBS_BY_PROJECT. - Wall-clock duration — how long a human waited. Worth tracking separately because a query can have low slot-time and high wall-clock (single-stage bottleneck) or high slot-time and low wall-clock (well-parallelized but expensive). The two failure modes have different fixes.
If you only ever look at one of these, look at slot-time. It is the right input to “is this query worth optimizing” and the only one that maps directly to your bill.
Reading the query plan
Open a query in the BigQuery console, run it, and click the Execution graph tab. You get the DAG, each node is a stage, and the relative width of each stage’s bar is its slot-time. The stages that are wide are where time and money go.
For any expensive stage, look at four things:
- Records read vs records written. A stage that reads 10B rows and writes 100M has a 100:1 filter ratio. Good. A stage that reads 10B and writes 10B is passing every row downstream — usually a join or a window function whose partition spec is too coarse.
- Compute ratio (avg vs max worker time). The execution graph shows the average and the slowest worker for each stage. A 10x ratio between average and max is skew — one worker did far more work than the others, and the stage was bottlenecked on it. The fix is almost always either repartitioning the input or pre-aggregating the skewed key.
- Wait ratio. “Wait” time on a stage is time workers spent waiting for upstream stages or for slot allocation. Wait time is what you reduce by adding more slots (in editions) or by structuring the query so fewer stages depend on each other serially.
- Shuffle volume. Each repartition step has a shuffled-bytes number attached. A stage that shuffles 200 GB to join two 10 GB tables is doing something wrong — usually a
JOINwithout a clustering or partition hint, or a broadcast that didn’t happen.
The pattern that catches the most production issues: a JOIN between a large fact and a “small” dimension where the dimension is actually 8 GB and not small. BigQuery decides between broadcast and shuffle joins based on input size estimates, and when the dimension is on the edge, the planner picks shuffle and you pay for it. The fix is to filter the dimension to the rows that can possibly join, materialize that as a CTE or temp table, and let the planner re-estimate.
See what QueryPlane can build for you
Connect to your database, write SQL with AI, and build shareable apps — all from your browser.
INFORMATION_SCHEMA.JOBS is the cost table
The single most useful query against any BigQuery account is SELECT * FROM region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY). That is your cost ledger. Every query, scheduled query, DML statement, and reservation autoscale decision shows up there with bytes scanned, slot-time, duration, user, and the SQL text.
The five rollups worth keeping as a saved view:
-- Top 20 queries by slot-time over the last 7 days
SELECT
query,
COUNT(*) AS runs,
SUM(total_slot_ms) / 1000 / 3600 AS slot_hours,
SUM(total_bytes_processed) / POW(1024, 4) AS tib_scanned,
AVG(TIMESTAMP_DIFF(end_time, start_time, MS)) AS avg_wall_ms
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
AND state = 'DONE' AND job_type = 'QUERY'
GROUP BY query
ORDER BY slot_hours DESC
LIMIT 20;
This single query is what tells you the difference between “we have a cost problem” and “we have one query that has a cost problem.” In every BigQuery account I have seen audited, more than half the slot-time goes to fewer than ten distinct queries. Fix those and the bill drops.
A second rollup that finds waste differently:
-- Queries with high bytes-scanned but low result size — the "scanned 4 TB
-- to count 12 rows" pattern.
SELECT
user_email,
query,
total_bytes_processed / POW(1024, 3) AS gb_scanned,
total_bytes_billed / POW(1024, 3) AS gb_billed,
cache_hit,
start_time
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
AND job_type = 'QUERY' AND state = 'DONE'
AND total_bytes_processed > 100 * POW(1024, 3)
AND statement_type = 'SELECT'
ORDER BY gb_scanned DESC
LIMIT 50;
You can label every query with --@@labels syntax in scheduled queries, or set job labels on the client. Labels show up in JOBS_BY_PROJECT.labels and they are the cleanest way to attribute cost to a team, dashboard, or pipeline. If you have any chargeback model at all, label every query that goes to production.
The third rollup is the cache rate:
-- Cache hit rate over the last 7 days. If it is not above 30% for
-- repeat-heavy workloads, something is invalidating the cache.
SELECT
COUNTIF(cache_hit) / COUNT(*) AS hit_rate,
COUNT(*) AS total_queries
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
AND job_type = 'QUERY' AND state = 'DONE';
BigQuery’s query cache is per-user and free — a cache hit returns the prior result and costs nothing in bytes or slots. The cache is invalidated when any source table or view changes, when the query references nondeterministic functions like CURRENT_TIMESTAMP(), when the query uses DML, or when the result is larger than the cache limit. Teams routinely break their cache by wrapping every dashboard query in a WHERE event_time > CURRENT_TIMESTAMP() - INTERVAL 7 DAY — that predicate makes the query nondeterministic and disables caching. The right pattern is WHERE event_time > TIMESTAMP '2026-06-01 00:00:00' with the boundary computed in the dashboarding layer.
The slot recommender and autoscaler
If you are on editions, the most important non-query knob is the size of your reservation. Too few slots and queries queue; too many and you are paying for idle compute. BigQuery offers two tools that together make the sizing decision close to automatic.
The slot recommender sits at the project-and-organization level and tells you, based on your last 30 days of INFORMATION_SCHEMA.JOBS, what baseline and maximum slot counts your workload would have benefited from. It distinguishes between “you under-provisioned and queries waited” and “you over-provisioned and slots sat idle.” For workloads with a steady weekly cycle (most analytics workloads), the recommender’s suggested baseline is usually within 10% of the right number.
The reservation autoscaler scales the reservation between your baseline and max settings in roughly one-second increments based on observed demand. You pay only for the slot-seconds actually used, so a reservation with baseline = 100 and max = 1000 costs you 100 baseline slot-hours plus whatever scale-up actually happened. The autoscaler is generally on by default for editions reservations.
The strategy that works for most teams:
- Set the
baselineto your typical interactive query floor (the slot count you want available with zero queue at peak interactive hours). For most teams that is 100-500 slots. - Set the
maxto roughly 2-4x your historical p99 slot usage. Highermaxreduces wall-clock at peak, lowermaxcaps the worst-case bill. - Use workload management to route batch jobs to a separate reservation so they cannot starve interactive users.
- Re-run the slot recommender quarterly. Workloads drift.
The pitfall to avoid: provisioning a single giant reservation that serves both interactive dashboards and overnight batch. The batch jobs eat the slots, dashboards queue, and the fix everyone reaches for first is “add more slots” — but the right fix is workload routing.
Materialized views and BI Engine
BigQuery’s materialized views are pre-computed query results that are kept up to date automatically as the base tables change. They are not the same as the materialized views you may have used in Postgres or Snowflake: the base table refreshes them incrementally, the planner rewrites queries to use them transparently (you do not have to reference the view by name), and they have specific restrictions on what kinds of queries can be materialized.
The pattern they win at: a heavy aggregation against a frequently-updated fact table that is read many times for every write. A common shape is:
CREATE MATERIALIZED VIEW sales_by_day_country
CLUSTER BY country
AS
SELECT
DATE(order_ts) AS day,
country,
COUNT(*) AS orders,
SUM(amount_usd) AS revenue
FROM sales
GROUP BY day, country;
A dashboard that runs SELECT day, SUM(revenue) FROM sales WHERE day BETWEEN ... GROUP BY day will be transparently rewritten by the planner to scan the materialized view instead of the base table — usually a 100x reduction in bytes scanned and slot-time, with the materialization cost amortized over thousands of reads.
The constraints worth knowing before you reach for one:
- The aggregation must use a supported function —
SUM,COUNT,MIN,MAX,AVG,COUNTIF, and a few others.APPROX_*aggregates are not supported. - The view body cannot include
JOINto non-MV-supported sources, subqueries withHAVING, orUNION. - Refreshes are incremental and bounded to 30 minutes maximum between refreshes by default. If the base table has streaming inserts, the view will lag the table by up to the refresh interval.
- You can
REFRESH MATERIALIZED VIEWmanually if you need stronger freshness on a particular run.
For very latency-sensitive dashboards — sub-second response on aggregations — there is also BI Engine, an in-memory cache layer that holds frequently-accessed table partitions in RAM and answers queries directly from there. BI Engine is provisioned in GB-hours per project, costs more per GB than slots, and only helps for the access patterns that fit in memory. It is the right answer for a small set of dashboards that need sub-second response on 10 GB or so of hot data; it is the wrong answer for general workload optimization.
Scheduled queries and resource sharing
Scheduled queries are recurring jobs that run on a cron-like schedule. They are the right tool for daily summary tables, weekly cost reports, hourly aggregation refreshes, and anything else that follows a fixed cadence and does not need to be triggered by a downstream system.
Three things to know about them.
First, scheduled queries inherit the slot reservation of the project they run in. If you do not assign the scheduled query a workload assignment, it runs against the default reservation, which is also serving your interactive queries. The pattern that bites teams: an analyst sets up a scheduled query to refresh a dashboard table at 9:00 AM, the query takes 4 minutes of high slot usage, and every dashboard that loads at 9:00 AM is slow for those four minutes. The fix is a separate batch reservation and a workload assignment that routes scheduled queries to it.
Second, scheduled queries can use @run_time and @run_date parameters to do correct incremental processing. WHERE event_date = @run_date parameterizes the date at scheduling time, which means the query is deterministic enough to cache and easy to backfill — you can run a specific historical date with one click in the console.
Third, scheduled queries do not retry by default. If the query fails because an upstream table is unavailable, you get an error notification and nothing happens until tomorrow. Wrap critical schedules with a downstream check (a second scheduled query that asserts row count or freshness) and surface failures somewhere a human will see them.
Verifying partition pruning
If you have done partitioning and clustering right, partition pruning should be doing 90% of the cost work for you. Verifying that it actually is doing what you think is one of the easiest cost optimizations in BigQuery.
The check before running a query is the dry-run — the planner returns the byte estimate without executing the query:
bq query --dry_run --use_legacy_sql=false \
"SELECT SUM(amount) FROM \`project.dataset.sales\` WHERE order_date = '2026-06-01'"
Compare the byte estimate against the total table size. If the estimate is the full table, partition pruning is not firing — either the predicate is referencing the wrong column, using a function that the planner cannot prove monotonic, or the table is not partitioned the way you think.
The check after running a query is INFORMATION_SCHEMA.JOBS. The total_bytes_processed field is the post-pruning scan, so for a query against a partitioned table, that value should be roughly the partition’s size, not the whole table’s.
The hidden cost trap: parameterized queries with @parameters referencing the partition column are pruned at execution time, not at planning time, so the dry-run estimate is conservative and shows the full table size. The bill, however, reflects the actually-pruned scan. The mismatch confuses people — the dry-run shows $40, the bill shows $0.04, and they assume one of the numbers is wrong. The dry-run is the upper bound.
Six common pitfalls
A short tour of the patterns that show up in every cost audit:
SELECT * against a wide table. BigQuery is columnar; every column you select is a separate read. SELECT * against a 200-column table scans all 200, even if your downstream code only uses three. The fix is mechanical and obvious, but it shows up everywhere because copy-pasting SELECT * is faster than typing column names. Audit your top-cost queries for it.
WHERE filter_column IN (SELECT ... FROM huge_table). Subquery-in-IN is fine when the inner result is small; it is a disaster when it isn’t. The planner can pre-compute the subquery, but in some shapes (especially with non-deterministic functions inside) it materializes the entire inner query as a shuffle. Rewrite as an EXISTS correlation or a JOIN, and the planner can prune.
Joining on CAST(a.id AS STRING) = b.id_str. Any function on a join key prevents the planner from using the column’s clustering or partition layout. The fix is to align the types at table-design time, not in every query.
Streaming inserts at high QPS into a non-streaming-optimized table. Streaming has its own cost — currently $0.05 per GB ingested. Teams often discover their streaming bill is larger than their query bill. The fix is batch loading where possible, or moving to the Storage Write API which is more efficient and supports exactly-once.
Tables without a partition column that “feel small.” A 50 GB table is not small. Querying it without partition pruning costs $0.31 per scan; ten dashboards refreshing it every minute is $1,800 a day. If the table is bigger than 1 GB and queried more than a few times per day, it should be partitioned.
Cache invalidation from streaming buffers. The query cache is invalidated whenever the underlying tables change, and streaming inserts count as “changing.” Tables under continuous streaming load have effectively zero cache hit rate. Either accept the cost, or split the table into a streaming-buffer table and a query-target table that you batch-merge into.
FAQ
What is the difference between bytes processed and bytes billed in BigQuery?
total_bytes_processed is what the query actually scanned after partition pruning and column projection. total_bytes_billed is what you pay for — it is the maximum of the actual scan and a per-query 10 MB minimum, plus on-demand pricing rules that round up to the nearest MB. For most non-trivial queries the two numbers are very close. The minimum charge per query is why running a thousand tiny queries can cost more than one large one.
How is the BigQuery query cache invalidated?
The cache is invalidated by any change to any source table or view, by nondeterministic functions (CURRENT_TIMESTAMP, RAND, SESSION_USER), by DML statements, by results larger than the cache size limit, and by explicit user actions. A common cache-killer is wrapping a dashboard query in WHERE ts > CURRENT_TIMESTAMP() - INTERVAL 7 DAY — the predicate is nondeterministic. Substitute a constant timestamp computed in the dashboard layer.
When should I use BigQuery materialized views vs scheduled queries?
Materialized views are the right choice when (1) the aggregation fits the supported function set, (2) the source table changes frequently, and (3) you want the planner to transparently rewrite queries to use the view without having to reference it by name. Scheduled queries are the right choice when you need full SQL flexibility (joins, window functions, DML), when the source data has a clear daily/hourly cadence, or when you are producing a curated table for downstream use that needs custom logic. Materialized views are cheaper for read-heavy aggregations; scheduled queries are more flexible.
What does the BigQuery slot recommender actually base its recommendation on?
The recommender reads your last 30 days of slot usage from INFORMATION_SCHEMA.JOBS_BY_PROJECT and JOBS_BY_ORGANIZATION, looks at the distribution of slot demand over time, and computes a baseline that minimizes the sum of (idle-slot cost) plus (queue wait converted to wall-clock cost). It assumes the next 30 days look like the last 30. If your workload is changing — new dashboards, a new pipeline, a Black Friday traffic event — re-evaluate manually rather than trusting it.
How do I tell if a BigQuery query is running on the cache?
INFORMATION_SCHEMA.JOBS.cache_hit is a boolean that is TRUE when the result was served from cache and FALSE otherwise. The query also returns instantly and shows 0 bytes processed. A persistent low cache-hit rate on a workload that should hit cache (dashboards reloading the same query every minute) is almost always a deterministic-predicate problem.
Can I cap the cost of a single query in BigQuery?
Yes, but the mechanism is per-job: set maximum_bytes_billed on the job configuration (or --maximum_bytes_billed on the CLI) and BigQuery refuses to start a job that would exceed the limit. This is the right safety belt for any client library that is executing user-supplied SQL — a UI builder, a notebook environment, a CI test runner — to prevent a runaway query from costing $400. It does not, however, work for ad-hoc console queries unless the user sets the flag.
How do I find the most expensive queries in my BigQuery account this week?
Run the “Top 20 queries by slot-time” query against region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT (or JOBS_BY_ORGANIZATION if you have organization-wide visibility) filtered to creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY). Sort by SUM(total_slot_ms) for editions cost or SUM(total_bytes_processed) for on-demand cost. Save this as a scheduled query that posts to Slack and you have a weekly cost audit.
Should I use BigQuery’s --use_query_cache=false for benchmarking?
Yes. Caching makes a re-run of the same query return in milliseconds and look incredibly fast, which is useless for benchmarking. For any “did my optimization help” comparison, run with --use_query_cache=false on both the before and the after. Also flush by changing a parameter or running against a slightly different time window.
What happens to a query if my reservation runs out of slots?
Queries queue. They stay in PENDING state until enough slots free up, at which point they are admitted and run. The autoscaler will scale up toward your max setting, but if max is reached, additional queries continue to queue indefinitely. The wait time is exposed as a stage’s “wait” component in the execution graph and as total_slot_ms - elapsed_ms * worker_count math you can do against INFORMATION_SCHEMA. Persistent high wait is the signal that the reservation needs a bigger max or that you need workload routing.
How does BigQuery’s on-demand pricing interact with the 1 TiB free tier?
The free tier is 1 TiB of bytes processed per month, calculated at the billing-account level. It applies before the per-TiB rate kicks in. If your bill is consistently between $0 and a few dollars a month, the free tier is most of what you are using. Once you start regularly going past 1 TiB in a billing period, the on-demand price ($6.25/TiB in US multi-region) is your marginal rate, and that is the right point to model whether editions would be cheaper.
Are materialized views always faster than the underlying query?
No. There is a cost to maintaining them — every base-table change re-runs the incremental refresh — and there is a per-query overhead to the planner’s rewrite decision. For low-traffic queries against a high-write table, the maintenance cost can exceed the query savings. The break-even is “the materialized view amortizes if it is read more often than the base table is updated.” For dashboards and BI workloads that is almost always true; for one-off analyses it is almost never true.
The shape of every BigQuery cost optimization comes back to the same three steps. First, find the queries that actually account for most of the bill — INFORMATION_SCHEMA.JOBS ranked by total_slot_ms does this in one query. Second, look at the execution graph for the top two or three and find the stage that is consuming the slot-time. Third, change the layout (partition / cluster / materialized view) so that stage either gets pruned away or reads less.
The temptation when a BigQuery bill jumps is to reach for the pricing model or the slot count first, because those are the visible knobs. The reality is that 80% of the bill in most accounts is concentrated in a handful of queries that have a simple optimization sitting in front of them. INFORMATION_SCHEMA.JOBS and total_slot_ms are the only two things you need to find them — and they are exactly the kind of system-table introspection QueryPlane is built for. Point a QueryPlane connection at your BigQuery project, build a dashboard against JOBS_BY_PROJECT, and the top-cost queries become a sortable table on your screen instead of a once-a-month accounting exercise.
If you want to dig deeper, pair this guide with our BigQuery partitioning and clustering guide for the layout side, our Snowflake query profile guide for the equivalent reasoning on Snowflake, and the best BigQuery GUI tools writeup if you’re evaluating clients. And try QueryPlane if you want a faster path from “we have a cost problem” to “here are the five queries that caused it.”