The Snowflake Cortex Analyst in Practice

Q: How is Cortex Analyst different from SNOWFLAKE.CORTEX.COMPLETE?

CORTEX.COMPLETE is a low-level SQL function that calls a hosted LLM with whatever prompt you provide and returns the raw completion. It has no awareness of your schema, no SQL generation, no semantic model. Cortex Analyst is the higher-level service built on top: it takes a question and a semantic model and produces a Snowflake SQL statement, with chain-of-reasoning text, verified-query matching, and Cortex Search integration. Use CORTEX.COMPLETE to build custom LLM pipelines; use Cortex Analyst for question-to-SQL on governed data.

Q: How does Cortex Analyst integrate with Cortex Search?

A dimension in the semantic model can declare a cortex_search_service block that points to a Cortex Search service. At query time, when the LLM identifies a filter on that dimension, it sends the user's literal value to Cortex Search and gets back the canonical column value. The generated SQL then filters on the resolved value instead of the user's text. This is how queries like "customers in the bay area" filter on the canonical value (e.g. 'San Francisco Bay Area') instead of the user's raw text 'bay area'.

Q: What is a verified query?

A verified query is a question/SQL pair, stored in the verified_queries section of the semantic model, that Cortex Analyst returns verbatim when a user asks a semantically similar question. It bypasses the LLM and gives you 100% deterministic SQL for the questions you know are hard. Verified queries have verified_at and verified_by fields for governance, and a use_as_onboarding_question flag that exposes them in the suggested-questions UI. The right operational pattern is to wire negative feedback into a triage queue and add 3-5 verified queries per week from there.

Q: Should I use a YAML file or a Semantic View?

For any new deployment, use a Semantic View (CREATE SEMANTIC VIEW). It's a first-class Snowflake object, supports grants, shows up in SHOW and DESC, and integrates with lineage. The YAML format is supported for backwards compatibility — existing deployments do not need to migrate — but the YAML lives on a stage and doesn't get the catalog benefits. Both formats use the same field names, so the YAML examples in this post translate directly to a Semantic View definition.

Q: What roles and grants are required?

The user calling the API needs the SNOWFLAKE.CORTEX_ANALYST_USER database role (or the broader SNOWFLAKE.CORTEX_USER), SELECT on the tables referenced by the semantic model, and READ on the stage holding the YAML file (if using the file format) or USAGE + SELECT on the Semantic View object (if using the catalog form). The SQL execution then runs under the user's role, so masking policies, row access policies, and column grants fire as they would for a hand-written query.

A practical guide to Snowflake Cortex Analyst: semantic model YAML, the REST API, Cortex Search joins for high-cardinality dimensions, verified queries, and the cost model.

This post was written by an engineer at QueryPlane. QueryPlane is an app builder for your database: bring your own postgres db and you can create interactive applications to share with other developers, coworkers or even your customers. If you’re interested in trying it out, get started here.

The text-to-SQL feature that data teams have been promised for a decade finally has a credible production answer on Snowflake: Cortex Analyst. It is the managed service that turns a business question — “what was the gross margin per region last quarter, excluding partner-sourced revenue?” — into a Snowflake SQL statement that runs against your tables, governed by your roles, and grounded in a semantic model you control. It is generally available on nine AWS and Azure regions, with cross-region inference covering the rest, and it routes between a curated set of LLMs at runtime (currently Claude Sonnet 4.6 / 4.5, GPT 4.1, Arctic Text2SQL, and Mistral/Llama combinations) without any of those choices leaking into your app code.

The pitch is straightforward. The catch is that it only works as well as the semantic layer you give it. A schema dump is not a semantic layer; a list of column names without synonyms, descriptions, or relationships will produce confidently wrong SQL on every non-trivial question. The Cortex Analyst team’s own evaluation work shows the production-quality benchmark is anchored on the quality of the semantic model and verified queries, not on the underlying LLM. So the real work — and the real lever — is in the YAML.

In this post, we’ll cover:

Where Cortex Analyst fits — vs CORTEX.COMPLETE, custom RAG pipelines, and Cortex Search
The semantic model — the YAML spec, the parts that matter, and the parts that quietly produce wrong SQL
The REST API — single-turn and multi-turn calls, error handling, and the “no result memory” footgun
Cortex Search for high-cardinality dimensions — joining Cortex Analyst to Cortex Search so "customers in the bay area" resolves to the right canonical region value
Verified queries — the golden-query override that takes the LLM out of the loop for known-hard questions
Pricing and the cost model — CORTEX_ANALYST_USAGE_HISTORY, what counts as a billable message, and the warehouse layer underneath
Pitfalls — the eight failure modes that show up in production deployments

Where Cortex Analyst fits

Snowflake’s Cortex AI umbrella now spans three distinct services that are easy to confuse. SNOWFLAKE.CORTEX.COMPLETE is a SQL function that calls a hosted LLM with whatever prompt you build and returns the raw completion — it is the building block for custom pipelines, but it has no awareness of your schema, no SQL generation, and no governance hooks beyond the warehouse and role the function runs in. Cortex Search is a fully-managed hybrid search service (vector + keyword + reranking) over text columns; it returns ranked documents, not SQL. Cortex Analyst is the text-to-SQL service: it takes a natural-language question, a semantic model, and an optional Cortex Search service for dimension lookups, and returns a Snowflake SQL statement plus the chain-of-reasoning text it used to get there.

The decision rule is concrete. If the user’s question is “summarize this support ticket” the answer is CORTEX.COMPLETE. If the question is “find the three most relevant Q3 product launch documents” the answer is Cortex Search. If the question is “how many enterprise customers churned last quarter and what was their average MRR?” — a question that wants a SQL statement against governed tables — the answer is Cortex Analyst. Teams that ship one without the others tend to back-solve into something that looks like the missing two anyway, so it’s worth understanding all three up front and choosing on intent.

Compared to a DIY text-to-SQL pipeline (LLM + schema prompt + retries), Cortex Analyst is doing four things you would otherwise have to build: it negotiates between multiple specialized LLMs (including Arctic Text2SQL, Snowflake’s own SQL-tuned model), it validates the generated SQL against the semantic model before returning it, it integrates with Cortex Search for high-cardinality lookups so the model never has to guess customer_id = 42, and it consumes a verified-queries repository that lets you encode “golden” answers for known-hard questions. Replicating those four pieces in a custom stack is a six-month project; Cortex Analyst is a config file and a REST call.

The semantic model

The semantic model is the single most important artifact in a Cortex Analyst deployment. Snowflake supports two formats: a YAML file uploaded to a stage (the original format, still supported) and a Semantic View — a first-class Snowflake object (CREATE SEMANTIC VIEW) that holds the same definitions but lives in the data catalog, supports grants, and integrates with SHOW and DESC. New deployments should prefer Semantic Views; existing YAML deployments keep working. Both formats use the same field names, so the YAML examples below translate one-to-one to a Semantic View DDL.

A minimal model has a name, one or more tables, and (almost always) a set of relationships:

name: revenue_semantic_model
description: |
  Revenue, customers, and product mix for the finance team.
  All amounts in USD; fiscal year ends January 31.

tables:
  - name: orders
    description: One row per placed order.
    base_table:
      database: ANALYTICS
      schema: FACT
      table: ORDERS
    dimensions:
      - name: order_status
        synonyms: ["status", "state", "order state"]
        description: Lifecycle state of the order.
        expr: STATUS
        data_type: VARCHAR
        is_enum: true
      - name: source_channel
        synonyms: ["channel", "acquisition channel"]
        expr: SOURCE_CHANNEL
        data_type: VARCHAR
        is_enum: true
    time_dimensions:
      - name: ordered_at
        synonyms: ["order date", "placed at", "when ordered"]
        expr: ORDERED_AT
        data_type: TIMESTAMP_NTZ
    facts:
      - name: gross_revenue
        description: Order subtotal before tax, discounts, and refunds.
        expr: SUBTOTAL_USD
        data_type: NUMBER
      - name: discount
        expr: DISCOUNT_USD
        data_type: NUMBER
    metrics:
      - name: net_revenue
        synonyms: ["net sales", "net revenue", "revenue net of discounts"]
        description: gross_revenue minus discount, summed.
        expr: SUM(SUBTOTAL_USD - DISCOUNT_USD)

Five field types do the real work. Dimensions are categorical attributes used for grouping and filtering — region, status, customer tier. Time dimensions are the same idea for date and timestamp columns, but they get special LLM handling for fiscal-period and relative-date phrasing ("this quarter", "YoY", "trailing 30 days"). Facts are row-level numeric values that have no aggregate baked in — subtotal_usd on the orders table is a fact. Metrics are aggregated expressions — SUM(subtotal_usd - discount_usd) is the net_revenue metric. Relationships connect tables on shared keys for joins.

Two of the optional fields punch above their weight. synonyms is a list of alternate phrasings the LLM treats as equivalent to the field’s name; the post-deploy quality of a model correlates more strongly with synonym coverage than almost anything else, because the LLM stops having to “guess” that "income" means net_revenue. description is fed to the LLM as part of the prompt context, so it should describe the business meaning of the field, not the SQL — "Lifecycle state of the order" is more useful than "VARCHAR column". The two together turn a schema dump into something the model can ground its choices in.

Relationships are simple but unforgiving:

relationships:
  - name: orders_to_customers
    left_table: orders
    right_table: customers
    relationship_columns:
      - left_column: customer_id
        right_column: id

Every join path that Cortex Analyst can produce must be declared. Tables that are not connected via a relationships entry are not joinable from the model’s perspective — the LLM will not invent a join condition. This is by design (it’s the same principle that makes dbt’s semantic layer safe), but it’s the most common cause of “why didn’t it find that?” in early deployments.

The expression field — expr — is a SQL fragment evaluated against the base_table. It can be a column name, a CASE expression, a coalesce, a window function, or a sub-aggregate. The implication is significant: if a business calculation has many edge cases (a fiscal-calendar adjustment, a discontinued-product exclusion, an FX conversion), the right place to encode it is inside the expr, not in the user’s question. The semantic model is the canonical place where the business definition of net_revenue lives; downstream questions stop having to repeat it.

The REST API

The Cortex Analyst REST endpoint is POST /api/v2/cortex/analyst/message, authenticated with an OAuth token from a Snowflake account. The minimal call shape is:

curl -X POST \
  -H "Authorization: Bearer $SNOWFLAKE_OAUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -H "X-Snowflake-Authorization-Token-Type: OAUTH" \
  "https://${ACCOUNT}.snowflakecomputing.com/api/v2/cortex/analyst/message" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "what was net revenue by region in Q3 2026?"}
        ]
      }
    ],
    "semantic_view": "ANALYTICS.SEMANTIC.REVENUE_MODEL"
  }'

The request specifies a semantic model with one of four mutually-exclusive fields: semantic_view (fully-qualified name of a Semantic View object), semantic_model_file (stage path like @db.schema.stage/file.yaml), semantic_model (inline YAML string — useful for prototyping but not for production), or semantic_models (an array of multiple models that share a conversation). Pick semantic_view for any new deployment; the object form gives you GRANT, DESC, and lineage out of the box.

The response carries the generated SQL plus structured reasoning:

{
  "request_id": "9f4e...",
  "message": {
    "role": "analyst",
    "content": [
      {"type": "text", "text": "I'll sum net_revenue grouped by region for Q3 2026."},
      {
        "type": "sql",
        "statement": "SELECT c.region, SUM(o.subtotal_usd - o.discount_usd) AS net_revenue FROM analytics.fact.orders o JOIN analytics.dim.customers c ON o.customer_id = c.id WHERE o.ordered_at >= '2026-08-01' AND o.ordered_at < '2026-11-01' GROUP BY c.region",
        "confidence": {
          "verified_query_used": null
        }
      }
    ]
  },
  "warnings": [],
  "response_metadata": {
    "model_names": ["claude-sonnet-4-6"],
    "question_category": "CLEAR_SQL"
  }
}

The content array has up to three kinds of parts: text (the model’s interpretation, suitable for showing to the user as “here’s what I’m doing”), sql (a statement plus a confidence block that names the verified query if one was used), and suggestions (a list of clarifying questions that appears when the model judges the request ambiguous). The sql and suggestions parts are mutually exclusive — if the model wasn’t confident enough to produce SQL it will produce suggestions instead, and an app’s UI should handle both states.

The request_id field is not optional in practice: it’s the key to the feedback endpoint (POST /api/v2/cortex/analyst/feedback) where end-users vote thumbs-up/thumbs-down on the generated SQL. Snowflake doesn’t surface that feedback in CORTEX_ANALYST_USAGE_HISTORY, so log each vote (keyed by request_id) into your own table — it’s the only signal for “the model is consistently wrong on questions like this.” Wire the feedback button in your UI on day one or you’ll be debugging blind for the first month of production.

A subtle but important detail: Cortex Analyst does not execute the SQL. The service returns the statement; your app is responsible for running it against a warehouse with the right role. This is deliberate — it preserves Snowflake’s role-based access model. The user whose token authenticates the REST call must have SELECT on the underlying tables; if they don’t, the generated SQL is correct but the execution fails with a privilege error, which is the right failure mode for a governed system but a confusing one for a user who just asked a question.

See what QueryPlane can build for you

Connect to your database, write SQL with AI, and build shareable apps — all from your browser.

Get Started Book a Demo

Cortex Search for high-cardinality dimensions

The single most common LLM failure mode in text-to-SQL is producing WHERE region = 'bay area' when the actual column value is 'San Francisco Bay Area' (or a region_id that maps to it through a dimension table). The semantic model can encode synonyms for known values, but enumerating every variant in a 50,000-row customer dimension is not viable.

Cortex Analyst solves this by letting a dimension declare a cortex_search_service that the LLM uses to resolve values at query time. The flow is: the LLM identifies that the user is filtering on customer_name, looks up the configured Cortex Search service for that dimension, sends the user’s literal value ("bay area") to Cortex Search, and gets back the canonical value(s) that match. The generated SQL then filters on the resolved value instead of the user’s literal.

Setup is two steps. First, create a Cortex Search service over the dimension column:

CREATE OR REPLACE CORTEX SEARCH SERVICE analytics.semantic.region_search
  ON region_name
  WAREHOUSE = analyst_xs
  TARGET_LAG = '1 hour'
  AS
  SELECT region_id, region_name, region_description
  FROM analytics.dim.regions;

Then attach it to the dimension in the semantic model:

- name: region
  synonyms: ["area", "location", "metro"]
  expr: REGION_ID
  data_type: VARCHAR
  cortex_search_service:
    service: analytics.semantic.region_search
    literal_column: region_name

literal_column is the text column Cortex Search ranks against. When the model identifies a filter on this dimension, it sends the user’s literal text ("bay area") to the search service and gets back the canonical column value to filter on — e.g. 'San Francisco Bay Area' — instead of guessing WHERE region = 'bay area'. The same pattern works for product names, customer names, ticket categories, anywhere a dimension has too many values to enumerate by synonym.

This is the integration point that makes the difference between a demo-quality and a production-quality deployment. Cortex Search bills separately (it has its own warehouse and per-token cost for indexing), but the alternative is fielding “the model can’t find my customer” tickets every week.

Verified queries — the override path

Some questions are hard enough that the LLM gets them wrong consistently — fiscal-calendar gymnastics, customer-cohort definitions that involve four WITH clauses, regulatory carve-outs. The semantic model spec includes a verified_queries array exactly for these:

verified_queries:
  - name: net_new_arr_by_quarter
    question: "what was net new ARR by quarter for the last 8 quarters?"
    sql: |
      WITH expansion AS (
        SELECT DATE_TRUNC('quarter', booked_at) AS qtr,
               SUM(arr_delta) AS amount
        FROM analytics.fact.subscription_changes
        WHERE change_type IN ('new', 'expansion')
        GROUP BY 1
      ), churn AS (
        SELECT DATE_TRUNC('quarter', booked_at) AS qtr,
               SUM(arr_delta) AS amount
        FROM analytics.fact.subscription_changes
        WHERE change_type IN ('churn', 'contraction')
        GROUP BY 1
      )
      SELECT e.qtr, e.amount + COALESCE(c.amount, 0) AS net_new_arr
      FROM expansion e
      LEFT JOIN churn c USING (qtr)
      WHERE e.qtr >= DATEADD(quarter, -8, DATE_TRUNC('quarter', CURRENT_DATE()))
      ORDER BY e.qtr
    verified_at: 1750000000
    verified_by: "finance_data_team"
    use_as_onboarding_question: true

When a user’s question is semantically similar enough to a verified query, Cortex Analyst skips LLM generation and returns the verified SQL verbatim, with confidence.verified_query_used populated in the response. The use_as_onboarding_question flag exposes the verified query in the suggested-questions UI, which is the single fastest way to teach a new user what the model can answer well.

The right operational pattern is to wire feedback (👎 from a user) into a triage queue, have a data engineer write the correct SQL, and add it as a verified query. Over a few weeks the verified set absorbs the “every Q3 someone asks this in a slightly different way” cases and the LLM only handles novel questions — which dramatically improves the perceived accuracy of the system even though the underlying generation quality is unchanged.

Pricing and the cost model

Cortex Analyst billing has two layers and one trap. The first layer is the Cortex Analyst service itself, billed per message processed (only HTTP 200 responses count — failed calls are free, which incentivizes well-scoped retries). The second layer is the warehouse that executes the generated SQL, billed normally at credits per second. Cortex Search, if used, bills as a third line item: an indexing-warehouse cost plus a per-query cost.

The trap is that the per-message cost scales with the size of the semantic model, not the size of the question. A semantic model with 50 tables and a long verified-queries section produces a larger LLM prompt than a model with 5 tables, and the per-message cost reflects that. Teams that ship one giant model for the whole company end up paying for every department’s metadata on every question. The right pattern is one Semantic View per business domain — finance, product, support — with the right access controls so that each app only loads the model it needs. The semantic_models request field (an array) supports the inverse case where a question genuinely spans domains, without forcing the always-on cost of a monolith.

Usage and cost monitoring is via SNOWFLAKE.ACCOUNT_USAGE.CORTEX_ANALYST_USAGE_HISTORY (rolled up daily, with the usual account-usage latency). Its columns are START_TIME, END_TIME, REQUEST_COUNT, CREDITS, and USERNAME — enough to trend request volume and spend, but note that feedback is not exposed in this view. To catch a model regression by negative-feedback rate, log the thumbs-down votes from the feedback endpoint (keyed by request_id) into your own table and trend them over a rolling window — a spike there shows up before the corresponding spike in support tickets.

-- request volume and credit spend (feedback is tracked separately, see below)
SELECT
  DATE_TRUNC('day', start_time) AS day,
  SUM(request_count)            AS requests,
  SUM(credits)                  AS credits_used
FROM snowflake.account_usage.cortex_analyst_usage_history
WHERE start_time >= DATEADD(day, -14, CURRENT_DATE())
GROUP BY 1
ORDER BY day DESC;

A neg_pct over 20% on any single day is a “drop everything and look at the questions” signal — usually it’s a new feature or onboarding flow producing questions the model wasn’t tuned for.

Multi-turn conversations and the no-memory trap

The REST endpoint supports multi-turn conversations: pass the full prior messages array (alternating user and analyst roles) and the model can resolve references like “now group that by product” against the prior question. This is the right pattern for an interactive UI.

The single most important thing to know about multi-turn is that Cortex Analyst does not have access to the results of previous SQL queries, only the SQL itself. If the prior turn returned region=BAY_AREA, net_revenue=$2.4M and the user asks “why is it so high?”, the model has no way to know the value was $2.4M. The semantic model and the prior SQL are the only context. Apps that show prior results to the LLM in the message history (by re-injecting them into the user message) tend to do better at follow-ups, but at the cost of higher token usage and potentially leaking row-level data into the LLM context.

Conversations also don’t have a built-in length limit on the service side, but the underlying LLM context window does. Long sessions degrade in quality starting around 10-15 turns; the right behavior is to offer a “start fresh” affordance and to truncate transparently when the session crosses a threshold.

Eight common pitfalls

1. The semantic model is a schema dump. Names alone are not enough. Without synonyms, description, and explicit expr definitions, the model has to guess what every field means, and it will guess wrong on every business-specific term. Spend a day on the model before judging Cortex Analyst’s accuracy.

2. Tables exist but no relationships are declared. The LLM will not invent join conditions. If two tables aren’t connected by a relationships entry, no question can cross them. Audit the relationship graph the first time a question fails — "do the tables actually connect in the model?" is the right starting question.

3. High-cardinality dimensions have no Cortex Search service. WHERE customer_name = 'acme' will not find 'Acme Industries, Inc.' without value resolution. For every dimension over a few hundred distinct values, attach a Cortex Search service.

4. is_enum: true is missing on enum columns. The flag tells the model the column has a small finite set of values; without it, the LLM treats every enum-style filter as fuzzy. The cost of mis-tagging is the same as case (3): hand-typed values that don’t match the canonical form.

5. Verified queries are never written. Cortex Analyst will eventually be good enough on novel questions, but it’s good immediately on verified questions. Every team should treat negative-feedback events as candidates for new verified queries, with a weekly triage to land the top 3-5.

6. The semantic model is a monolith. Per-message cost scales with model size. Break the model into per-domain Semantic Views (finance, product, support) and request only the relevant view per question. The semantic_models array handles the rare cross-domain case without forcing everyone to pay for everything.

7. SQL execution is on the analyst’s account, not the user’s. It is tempting to run the generated SQL with a service account so users without table grants can still ask questions. This breaks Snowflake’s role-based access model — the user’s role checks evaporate. The right pattern is OAuth-on-behalf-of the user (or external OAuth via your IdP), so the generated SQL runs as the asker and policies fire normally.

8. No feedback button in the UI. Negative feedback is the only signal for “the model is wrong on questions like this.” Without it, you’re debugging from support tickets and Slack threads. Wire request_id → feedback endpoint on day one, and log the votes into your own table so you can watch the negative-feedback rate weekly (the usage-history view doesn’t expose it).

Wrapping up

Cortex Analyst is the production answer to text-to-SQL on Snowflake. It hides the multi-model routing and the SQL validation that a DIY pipeline would have to build itself, and it composes cleanly with Cortex Search for dimension value resolution and verified queries for known-hard questions. The work is not in the API call — that’s a few lines — but in the semantic model and the operational loop around feedback. Teams that treat the semantic model as a product (synonyms, descriptions, relationships, verified queries) ship something users trust; teams that point Cortex Analyst at a raw schema get the same demo-quality output as every other text-to-SQL tool.

The patterns that pay off fastest: one Semantic View per business domain so per-message cost stays in line with what’s actually being asked, Cortex Search services on every dimension over a few hundred distinct values, verified queries for the questions that come up every quarter, and a feedback button wired to the feedback endpoint (logged to your own table) from day one. With those four in place, the system gets measurably better every week as the verified set absorbs the repeat-pattern questions and the synonym list grows to match how the business actually talks.

The natural next step in a Snowflake AI stack is the storage layer underneath the analytics tables: iceberg tables for open-format interoperability, dynamic tables for the materialization layer that holds the metrics Cortex Analyst queries, clustering keys for the partition pruning that keeps generated SQL fast, and warehouse sizing for the compute that ultimately executes everything. And when you need a SQL editor and app builder that works on top of Cortex Analyst output — running generated SQL against your warehouse, charting results, and sharing dashboards with role-based access — the QueryPlane Snowflake integration connects in a few minutes.

Frequently asked questions

What is Snowflake Cortex Analyst? Cortex Analyst is Snowflake’s managed text-to-SQL service. It turns a natural-language question, paired with a semantic model that describes your tables and metrics, into a Snowflake SQL statement plus structured reasoning about how it derived it. It runs on a curated set of LLMs (currently Claude Sonnet 4.6/4.5, GPT 4.1, Snowflake’s Arctic Text2SQL, and Mistral/Llama combinations) and selects between them at runtime. It is generally available on nine AWS and Azure regions, with cross-region inference for the rest.

How is Cortex Analyst different from SNOWFLAKE.CORTEX.COMPLETE? CORTEX.COMPLETE is a low-level SQL function that calls a hosted LLM with whatever prompt you provide and returns the raw completion. It has no awareness of your schema, no SQL generation, no semantic model. Cortex Analyst is the higher-level service built on top: it takes a question and a semantic model and produces a Snowflake SQL statement, with chain-of-reasoning text, verified-query matching, and Cortex Search integration. Use CORTEX.COMPLETE to build custom LLM pipelines; use Cortex Analyst for question-to-SQL on governed data.

How does Cortex Analyst integrate with Cortex Search? A dimension in the semantic model can declare a cortex_search_service block that points to a Cortex Search service. At query time, when the LLM identifies a filter on that dimension, it sends the user’s literal value to Cortex Search and gets back the canonical column value. The generated SQL then filters on the resolved value instead of the user’s text. This is how queries like "customers in the bay area" filter on the canonical value (e.g. 'San Francisco Bay Area') instead of the user’s raw text 'bay area'.

What is a verified query? A verified query is a question/SQL pair, stored in the verified_queries section of the semantic model, that Cortex Analyst returns verbatim when a user asks a semantically similar question. It bypasses the LLM and gives you 100% deterministic SQL for the questions you know are hard. Verified queries have verified_at and verified_by fields for governance, and a use_as_onboarding_question flag that exposes them in the suggested-questions UI. The right operational pattern is to wire negative feedback into a triage queue and add 3-5 verified queries per week from there.

Does Cortex Analyst execute the generated SQL? No. Cortex Analyst returns the SQL statement; your application is responsible for executing it against a warehouse with the user’s role. This preserves Snowflake’s role-based access model — the same row access policies, masking policies, and column grants that govern direct Snowflake queries also govern Cortex Analyst-generated queries, because they’re executed by the user’s role. Apps that bypass this by executing as a service account lose the governance layer entirely.

How is Cortex Analyst billed? Two layers, sometimes three. Cortex Analyst itself is billed per message processed (only HTTP 200 responses are billable). The warehouse that executes the generated SQL is billed normally at credits per second. If you’ve attached Cortex Search services to dimensions, indexing and per-query costs for those services are a third line item. The per-message cost scales with semantic model size, so monolithic models cost more per question than focused per-domain models. Monitor via SNOWFLAKE.ACCOUNT_USAGE.CORTEX_ANALYST_USAGE_HISTORY.

Does Cortex Analyst remember results from previous questions? No. In a multi-turn conversation, Cortex Analyst sees the prior messages (user questions and the analyst’s text + SQL responses), but it does not see the rows returned when your app ran the SQL. A follow-up question like "why is it so high?" has no value to reason about because the model never saw the result. Apps that want grounded follow-ups have to re-inject result summaries into the message history, at the cost of higher token usage and the risk of leaking row data into the LLM context.

Which LLM does Cortex Analyst use? Cortex Analyst selects between a curated set at runtime: Anthropic Claude Sonnet 4.6 and Sonnet 4.5, OpenAI GPT 4.1, Snowflake’s own Arctic Text2SQL model, and combinations of Mistral Large 2 and Llama 3.1. The selection is opaque to the caller — there is no field to pin a specific model — but the response_metadata.model_names field in the response reports which model(s) produced the answer. The benefit of the abstraction is that Snowflake routes you to whichever model is best for the question category, including SQL-tuned models for SQL-heavy questions, without you having to track the changing landscape.

Should I use a YAML file or a Semantic View? For any new deployment, use a Semantic View (CREATE SEMANTIC VIEW). It’s a first-class Snowflake object, supports grants, shows up in SHOW and DESC, and integrates with lineage. The YAML format is supported for backwards compatibility — existing deployments do not need to migrate — but the YAML lives on a stage and doesn’t get the catalog benefits. Both formats use the same field names, so the YAML examples in this post translate directly to a Semantic View definition.

What roles and grants are required? The user calling the API needs the SNOWFLAKE.CORTEX_ANALYST_USER database role (or the broader SNOWFLAKE.CORTEX_USER), SELECT on the tables referenced by the semantic model, and READ on the stage holding the YAML file (if using the file format) or USAGE + SELECT on the Semantic View object (if using the catalog form). The SQL execution then runs under the user’s role, so masking policies, row access policies, and column grants fire as they would for a hand-written query.