Connecting Copilot Studio to Snowflake isn’t the hard part. (ok well its more complex than most, but not impossible)
The hard part is doing it well.
It’s tempting to wire up a connector, let the agent generate SQL, and hope the model figures things out. Sometimes that works. Often, it doesn’t. You end up with inconsistent responses, brittle queries, and a growing collection of 422 errors that all trace back to the same root cause: the architecture was never designed for how agents actually reason.
Copilot Studio Meets Snowflake: A Working Log on Agent Strategy
This post is about implementing an agent strategy for accessing Snowflake data in Copilot Studio — not just wiring up a connector and hoping the SQL gods are merciful.
There are three valid patterns to compare, and they're easy to conflate even though they behave very differently in production:
Direct SQL connector — the Copilot agent generates SQL and sends it through the Snowflake connector.
Cortex Analyst (direct API call) — Copilot calls Cortex Analyst, which generates SQL from a semantic model.
Snowflake Data Agent (Cortex Agent) — Copilot calls a Snowflake Cortex Agent, which orchestrates Analyst, Cortex Search, and any custom tools you've wired up.
The shorthand: A is “Copilot writes SQL.” B is “Snowflake writes SQL from a semantic model.” C is “Snowflake runs an agent that decides what to do, and one of those things might be writing SQL.”
The architecture choice matters because it shapes user experience: response consistency, how well ambiguous questions get handled, and how often the assistant “wanders” into creatively-wrong SQL. Picking the right pattern measurably reduces 422 errors and keeps answers grounded enough that your users stop emailing you screenshots at 9pm.
This is a working log. Practical, architecture-focused, reusable. No winner crowned — just trade-offs surfaced, with real evaluation data to back them up.
Who this post is for
- Copilot Studio builders deciding between “simple now” and “scalable later”
- Data platform teams supporting Snowflake-backed agent scenarios
- Architects balancing reliability, governance, and how fast they can ship without regretting it in Q3
The three patterns, in detail
Option A — Standard SQL connector
User asks a question in Copilot Studio
Agent generates SQL
Snowflake connector executes it
Agent summarizes the results
The agent does the thinking. Snowflake just runs what it's told.
How it's wired: Option A uses the native Snowflake connector in Power Platform. The connector is purpose-built to execute SQL — that's its design. The Copilot Studio agent generates a query string and hands it to the connector's “Execute query” action. There's no middleware, no custom code, no semantic layer between the agent and Snowflake. That's the appeal: zero plumbing. It's also exactly why this option is the most fragile — every column name and join the agent guesses has to land perfectly, because nothing downstream is going to catch it.
Option B — Cortex Analyst (direct API call)
User asks a question in Copilot Studio
The flow calls the Cortex Analyst REST API with a semantic model
Analyst generates SQL grounded in the semantic YAML, executes it, returns results
Copilot summarizes the response
Snowflake does the text-to-SQL. Copilot still owns the conversation, the synthesis, and any orchestration.
How it's wired: Option B uses a Power Automate Agent Flow that calls a Snowflake stored procedure:
CALL COPILOT_DEMO.PUBLIC.CALL_CORTEX_FOOTBALL_AGENT('<prompt>');
The procedure wraps the Cortex Analyst invocation on the Snowflake side. The flow takes the user's natural-language prompt, passes it to the procedure, and returns the result back to Copilot Studio. Why this design: Cortex Analyst's REST API isn't directly exposed as a first-class action in the Power Platform connector catalog. Wrapping the call in a stored procedure means the standard Snowflake connector can invoke it, and the heavy lifting — semantic-model lookup, SQL generation, execution — stays inside Snowflake where it belongs. The Agent Flow is essentially a thin pipe: prompt in, grounded result out.
Option C — Snowflake Data Agent (Cortex Agent)
User asks a question in Copilot Studio
The flow calls a Snowflake Cortex Agent
The Agent decides which tools to use — Analyst for structured data, Cortex Search for unstructured, custom tools for the rest — and runs a multi-step plan if needed
Copilot returns the grounded response text
How it's wired: Option C calls a custom HTTP endpoint instead of going through a Power Platform connector:
POST https://[instance].azurewebsites.net/api/ask
The endpoint is a lightweight Azure Web App that proxies between Copilot Studio and the Cortex Agent's REST API. Why this design: the Cortex Agent API needs request signing, authentication handling, and — critically — conversation state across multi-step reasoning. Doing all of that inside Copilot Studio's connector framework gets messy fast. Centralizing it in an Azure-hosted middleware means Copilot Studio just makes a clean HTTP POST with the prompt, and everything else (auth, agent session management, tool orchestration, response shaping) is handled server-side.
Bonus: the same /api/ask endpoint can be reused by Teams bots, web apps, or anything else that can speak HTTP — so the Agent investment isn't locked to Copilot Studio.
The semantic model: shared between B and C
This is the bit most teams overlook on day one and pay for later.
Heads up: Options B and C share the same semantic model. Both Cortex Analyst (Option B) and the Snowflake Data Agent (Option C) read from the same semantic YAML and the same set of business-friendly views. The Agent in Option C uses Analyst as one of its tools — so any improvement to the semantic model lifts both options at once.
The practical implication: invest in defining metrics, dimensions, and join paths once, adopt B today, graduate to C later — and you don't redo that work. Option A doesn't benefit from the semantic model at all because it generates SQL straight off the raw schema. Which is exactly why it's the most fragile of the three.
The semantic views are doing real work for B and C:
Reducing join ambiguity (no more “which customer_id did you mean?”)
Presenting business-friendly metrics and names instead of raw column salad
Making text-to-SQL repeatable across paraphrased prompts
Lowering schema and column hallucination risk
If your Cortex answers feel mediocre, the semantic model is usually where to look first.
Real-world scenario
Domain: football ticketing analytics
Why this scenario: heavy joins, repeated aggregations, and the kind of natural-language ambiguity that exposes weak text-to-SQL fast
Evaluation prompt set: 30 natural-language questions imported from CSV using Agent Evaluation in Copilot Studio.
Same Snowflake environment and connector auth model
Same dataset and schema
Same 30-prompt evaluation set
Same evaluation process in Copilot Studio
Options B and C used the same semantic model
Out of scope (so nobody emails me about it): large-scale performance benchmarking, multi-language testing, and unstructured search. All earmarked for the next iteration.
Results: 30 questions, three patterns
| Pattern | Success rate | 422 errors | Other errors |
| A: Direct SQL Connector | 54% (16/30) | 14 (all failures) | 0 |
| B: Cortex Analyst | 64% (19/30) | 1 | 10 (accuracy) |
| C: Cortex Agent | 67% (20/30) | 0 | 10 (accuracy) |
A few things jump out from the numbers.
Option A failed loudly. Every single failure was a 422 — the agent generated SQL that didn't line up with the database schema. Wrong table names, wrong column names, joins that don't exist. From a user's perspective: “the assistant is broken.”
Option B nearly eliminated 422s. The semantic model did its job — Analyst grounded SQL generation against curated views, so structural errors collapsed from 14 down to 1. The remaining failures shifted to accuracy problems: the query ran fine, but the answer didn't quite match what the user meant. That's a much better failure mode — you're now arguing about interpretation, not whether the agent can speak SQL.
Option C eliminated 422s entirely and edged accuracy up. The Cortex Agent's tool routing handled a few prompts that Analyst alone couldn't, which produced the 3-point lift. The remaining failures are all accuracy-related — the same class as Option B — which suggests the next round of improvement should target the semantic model and prompt rubric, not the orchestration pattern.
The headline insight: the semantic model didn't just lift accuracy, it eliminated the failure mode that made Option A feel unreliable. Going from 14 → 1 → 0 structural errors changes the user's perception of trust far more than the success-rate jump suggests.
Pros and cons, side by side
Option A — Standard SQL Connector
Pros
- Fastest path to a working demo
- Lowest latency (one hop, no extra services)
- No new Snowflake objects required
- No Cortex credits in the bill
- Full transparency — the agent's prompt is the whole story
Cons
- Highest hallucination risk on table and column names
- 422 errors when generated SQL is malformed (in our test: every failure was one of these)
- Inconsistent answers across rephrased questions
- No semantic layer means no shared definition of “active customer,” “revenue,” etc.
- Logic is trapped inside the Copilot agent — not reusable elsewhere
- Multi-step questions tend to fall apart
Option B — Cortex Analyst
Pros
- Semantic model enforces business metric definitions
- Noticeably better text-to-SQL accuracy than agent-generated SQL
- Near-elimination of 422 errors (1 of 11 failures in our test)
- More consistent across paraphrased prompts
- Governance and definitions live in version-controlled YAML
- Predictable single-purpose surface — easy to reason about
Cons
- Single-shot only — no tool routing, no follow-up reasoning
- No native unstructured/document search
- Requires building and maintaining the semantic model (real work, not a wizard click)
- Copilot still owns response synthesis, so you're still prompt-engineering
- Cortex consumption shows up on the Snowflake bill
- Questions that need more than one query won't land
Option C — Snowflake Data Agent (Cortex Agent)
Pros
- Multi-tool orchestration: Analyst + Cortex Search + custom tools in one place
- Handles multi-step questions and follow-ups without bespoke flow logic
- Zero structural (422) errors in our test
- Blends structured and unstructured answers naturally
- Conversation logic is centralized and reusable across surfaces (Copilot, Teams, web, Snowflake Intelligence)
- Best end-user feel for ambiguous prompts — fewer “did you mean…” iterations
Cons
- Most setup of the three (semantic model, agent config, tool definitions, permissions)
- Two layers of orchestration means two layers of debugging when something goes sideways
- Highest latency — more hops, sometimes multi-step plans
- Highest Snowflake-side cost (Agent + Analyst + Search credits stack)
- Less direct control from Copilot Studio over how individual questions are handled
- Capability surface is still evolving — expect changes
When to pick which
A gut-check guide, not a verdict:
| Situation | Lean toward |
| Demo or POC, narrow schema, predictable users | A |
| One well-defined analytics domain, repeat business questions, need consistent metrics | B |
| Mixed structured + unstructured, multi-step questions, multiple consuming surfaces | C |
| Heavy governance needs and a dedicated data platform team | B or C |
| “We need this working by Friday” | A, then plan the migration |
Wrapping up
A few takeaways from the build:
The semantic model is the leverage point. Going from no semantic layer (A) to a curated one (B) cut structural errors by 93%. That's a bigger user-experience win than the 10-point accuracy bump.
B and C share infrastructure. If you build the semantic model right, the migration from B to C is mostly orchestration plumbing — not a rewrite.
C earns its setup cost when prompts get messy. On well-shaped questions, B and C are close. On ambiguous, multi-step, or mixed structured/unstructured questions, the Agent's tool routing pulls ahead.
Accuracy errors are a different conversation. Once you eliminate 422s, the remaining gap is about how well your semantic model represents the business — not the orchestration pattern.
The integration mechanism scales with the architecture. Connector → stored procedure via Agent Flow → Azure-hosted HTTP middleware. Each step adds capability, and each step adds something else to debug when things break.
Microsoft recently shipped Snowflake as a knowledge source for Copilot Studio (public preview, with GA reached on May 19, 2025). Makers point Copilot Studio at specific Snowflake tables, and the platform handles the schema awareness, query generation, joins, and filters in the background — no SQL written by hand, no semantic YAML required to get started, and the data never leaves Snowflake. Auth is OAuth via Microsoft Entra ID, so governance stays clean.
Why this matters for the three patterns above: the knowledge-source path effectively gives the agent schema grounding for free. That's the single biggest factor that made Option A unreliable in our test — the agent was guessing column names against a raw schema with no help. With Snowflake-as-knowledge plugged in, that guesswork shrinks dramatically, and the lift compounds with whatever pattern you pick on top.
So if I were starting a Copilot-on-Snowflake build today, I'd wire up Snowflake as a knowledge source first — it's the lowest-friction way to raise the floor on accuracy across every option — and then layer Cortex Analyst (B) or the Cortex Agent (C) on top as the workload demands more semantic depth or multi-tool reasoning. The semantic model is still where the long-term leverage lives; the knowledge source just gets you closer to "useful" on day one.
Useful resources
Snowflake guide: https://www.snowflake.com/en/developers/guides/getting-started-with-microsoft-copilot-studio-and-cortex-agents/
Power Platform Connector Guide - https://learn.microsoft.com/en-us/connectors/snowflakev2/
Microsoft Learn — Add Snowflake as a knowledge source: https://learn.microsoft.com/en-us/power-platform/release-plan/2025wave1/microsoft-copilot-studio/add-snowflake-as-knowledge-source
No comments:
Post a Comment