Streaming the Claude Agent SDK to a Browser via WebSockets

Stream Claude Agent SDK responses to the browser via WebSockets — architecture, human-in-the-loop approval, and why SSE fell short.

This post was written by an engineer at QueryPlane. QueryPlane is an app builder for your database: bring your own postgres db and you can create interactive applications to share with other developers, coworkers or even your customers. If you’re interested in trying it out, get started here.

We’re building an agent mode for QueryPlane. Instead of manually wiring up queries, transforms, and components, a user describes what they want and an AI agent builds it—executing SQL against their database, reading schemas, configuring components, and assembling full interactive apps. The agent runs on Anthropic’s Claude Agent SDK, which means we need to stream multi-turn agentic conversations to the browser in real time, including a step where the user approves or rejects destructive actions before they execute.

Our first instinct was to use Server-Sent Events (SSE) to stream agent responses to the frontend. It worked for basic text streaming. Then we hit tool use and the human-in-the-loop approval flow, and everything got complicated. This post is about why we switched to WebSockets, what the final architecture looks like, and the gotchas we ran into along the way.

In this post, we’ll cover:

Why SSE fell short - The bidirectional communication problem when Claude needs tool approval
The tool_use pattern - How the agentic loop works and why certain tools require user input
The WebSocket architecture - A three-tier chain from React to Go to a Python agent running the Claude Agent SDK
The approval flow - How we block tool execution while the user reviews changes
Gotchas - Concurrent writes, asyncio coordination, reconnection, and session persistence

How Claude’s agentic loop works

The Claude Messages API supports tool use—you describe the tools available (name, description, input schema), and Claude decides when and how to invoke them. When Claude wants to use a tool, the response includes a tool_use content block and finishes with stop_reason: "tool_use" instead of "end_turn". You execute the tool, send back a tool_result, and Claude continues. This creates an agentic loop: Claude reasons, calls a tool, reads the result, reasons more, maybe calls another tool, and eventually produces a final answer.

{
  "stop_reason": "tool_use",
  "content": [
    { "type": "text", "text": "I'll apply those component changes now." },
    {
      "type": "tool_use",
      "id": "toolu_01A09q90qw90lq917835lq9",
      "name": "applyQueriesAndComponents",
      "input": {
        "config": {
          "queries": [{"slug": "get_users", "body": "SELECT * FROM users"}],
          "components": [{"type": "table", "querySlug": "get_users"}]
        }
      }
    }
  ]
}

For QueryPlane, we expose four tools to Claude. Three are safe to auto-execute: get_connection_schema (reads database structure), execute_query (runs a test query), and validateQueriesAndComponents (validates a proposed app config). The fourth—applyQueriesAndComponents—actually mutates the app. It changes queries and components in a user’s live workspace. That one requires explicit user approval before it runs.

This is the human-in-the-loop pattern that Anthropic recommends for agentic applications: high-impact actions stay behind user approval. The agent proposes changes, the user reviews a diff of what will change, and only then does it execute.

Why SSE fell short

Claude doesn’t prescribe a particular streaming protocol—it’s an HTTP API, and how you deliver responses to your own frontend is entirely up to you. Our first prototype chose SSE because the data appeared unidirectional: the backend streams text to the browser, the browser renders it. Simple enough. The Go backend called the Claude API, read the response, and forwarded chunks to the frontend over an SSE stream.

Browser                   Go Backend                Claude API
  |--- POST /chat -------->|                            |
  |                         |--- POST /messages -------->|
  |                         |<---- response chunks ------|
  |<---- SSE: text_delta ---|                            |
  |<---- SSE: text_delta ---|<---- response chunks ------|
  |<---- SSE: done ---------|                            |

This worked for streaming text. The problem appeared when Claude returned a tool_use block for applyQueriesAndComponents. The backend needed to show the user what would change, wait for the user to click approve or reject, then send that decision back to the agent before execution could continue.

With SSE, the approval has to travel back through a separate HTTP POST to some /approve endpoint. This is technically possible but introduces several issues that compounded in practice.

Session correlation. The SSE stream and the approval POST are two independent HTTP connections. The backend needs to match the incoming POST to the correct agent session. This means generating session IDs, storing pending approval state, and cleaning it up on timeout or disconnection. With WebSockets, the session is the connection itself—there’s nothing to correlate.

Connection lifecycle during idle waits. When a tool needs approval, the agent session pauses. The user might take thirty seconds to review the proposed changes, or they might step away for five minutes. SSE connections can drop during these idle periods—reverse proxies and load balancers often have default idle timeouts around 60 seconds. The browser’s EventSource API does auto-reconnect, but re-establishing the SSE stream means the backend needs to track where the stream left off and resume from that point. You end up maintaining an idle connection that exists solely to wait for a response it can’t receive.

Multiple round trips per agent turn. Each tool_use → tool_result cycle is a separate round trip to Claude. The backend calls Claude, gets back a tool_use, waits for approval, executes, then calls Claude again with the result appended. On the frontend side, the backend needs to either hold the SSE connection open across these multiple round trips, or establish a new one each time. Both options add lifecycle complexity that a persistent WebSocket avoids entirely.

No backpressure from the client. If the user is reviewing one tool call and Claude’s response to a previous result triggers another, SSE has no mechanism for the client to signal “hold on, I’m still reviewing.” With WebSockets, the server simply waits for a message from the client before continuing.

Any one of these issues is solvable in isolation. Together, they create a system where the SSE connection between backend and frontend is doing a poor job of pretending to be bidirectional. We stepped back and acknowledged that what we actually needed was a bidirectional protocol.

The architecture: a three-tier WebSocket chain

The final architecture has three tiers connected by two WebSocket connections:

React Frontend ←— WebSocket —→ Go Backend ←— WebSocket —→ Python Agent ←→ Claude Agent SDK

The React frontend opens a WebSocket to the Go backend. The Go backend, on each chat message, opens a second WebSocket to a Python agent orchestrator that wraps the Claude Agent SDK. The Python agent manages the conversation with Claude, exposes tools via MCP (Model Context Protocol), and streams events back through the chain.

We split the backend into Go and Python for a practical reason: the official Claude Agent SDK is available in Python and TypeScript, not Go. Community Go ports exist, but we wanted the stability of the official SDK for the core agentic loop. Go handles what it’s good at—HTTP routing, authentication, WebSocket management, database access—while Python handles the Claude conversation.

Message types on the wire

The frontend sends three message types to Go:

// User sends a chat message
{ "type": "chat", "message": "Build a users dashboard", "chatId": "...", "connectionId": "..." }

// User approves or rejects a proposed change
{ "type": "approval_response", "approvalId": "...", "approved": true }

// Keepalive
{ "type": "ping" }

The backend streams events back:

// Text content streaming token by token
{ "type": "stream", "content": "I'll start by reading your database schema..." }

// Claude wants to use a tool
{ "type": "tool_use", "tool": "get_connection_schema", "toolId": "...", "input": {...} }

// Tool finished executing
{ "type": "tool_result", "toolId": "...", "result": {...}, "isError": false }

// A destructive tool needs user approval before running
{ "type": "approval_required", "approvalId": "...", "changes": {...}, "diff": {...} }

// Agent turn is complete
{ "type": "complete", "totalCost": 0.03, "durationMs": 4200 }

How Go bridges the two WebSockets

When the frontend sends a chat message, Go authenticates the request, loads the app context (connection details, existing queries, component config), and opens a WebSocket to the Python agent. It forwards the message along with the app context, then enters a read loop that streams events from Python back to the frontend.

// Connect to the Python agent (internal, authenticated)
func connectToPython(orgID string) (*websocket.Conn, error) {
    header := http.Header{}
    header.Set("X-Internal-Secret", Manager.GetInternalSecret())
    header.Set("X-Org-ID", orgID)

    url := fmt.Sprintf("ws://127.0.0.1:%s/ws", agentPort)
    conn, _, err := websocket.DefaultDialer.Dial(url, header)
    return conn, err
}

The critical detail is how approval responses flow. When the frontend sends an approval_response, the Go handler needs to forward it to the Python agent—but the read loop for Python events is running in a separate goroutine. We track the active Python connection behind a mutex so the approval handler can reach it:

var (
    pythonMu   sync.Mutex
    pythonConn *wsConn
)

// In the chat handler goroutine:
pythonMu.Lock()
pythonConn = pyConn
pythonMu.Unlock()

// In the main WebSocket read loop, when approval_response arrives:
case "approval_response":
    pythonMu.Lock()
    pc := pythonConn
    pythonMu.Unlock()
    if pc != nil {
        pc.WriteJSON(map[string]interface{}{
            "type":       "approval_response",
            "approvalId": msg.ApprovalID,
            "approved":   msg.Approved,
        })
    }

How the Python agent orchestrates Claude

The Python agent uses the Claude Agent SDK’s ClaudeSDKClient to manage the conversation. Tools are exposed via an MCP server that the SDK discovers automatically. When Claude calls a tool, the SDK invokes the corresponding Python function.

from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions, tool

@tool(name="get_connection_schema", description="Get the database schema")
async def get_connection_schema(connection_id: str) -> dict:
    # Calls Go's internal API to fetch schema
    return await call_internal_api("/ai/get-connection-schema", {
        "connectionId": connection_id
    })

@tool(name="execute_query", description="Execute a SQL query for testing")
async def execute_query(connection_id: str, query_body: str) -> dict:
    return await call_internal_api("/ai/execute-query", {
        "connectionId": connection_id,
        "queryBody": query_body
    })

Three of the four tools auto-execute through this pattern—Claude calls them, the SDK runs them, the result flows back to Claude, and the conversation continues. The frontend sees tool_use and tool_result events stream by in real time but doesn’t need to intervene.

The fourth tool is different.

See what QueryPlane can build for you

Connect to your database, write SQL with AI, and build shareable apps — all from your browser.

Get Started Book a Demo

The approval flow

applyQueriesAndComponents is the tool that actually mutates the user’s app. When Claude calls it, the tool doesn’t execute immediately. Instead, it computes a diff of what would change, sends an approval_required event to the frontend through the WebSocket chain, and blocks until the user responds.

async def apply_queries_and_components(config: dict) -> dict:
    queries = config.get("queries", [])
    components = config.get("components", [])

    # Compute what would change
    diff = await compute_diff(app_id, {"components": components}, queries)

    # Create a pending approval and notify the frontend
    approval_id = create_pending_approval()
    await ws.send_json({
        "type": "approval_required",
        "approvalId": approval_id,
        "changes": {"queries": queries, "components": components},
        "diff": diff,
    })

    # Block until the user responds (5-minute timeout)
    result = await wait_for_approval(approval_id, timeout=300.0)

    if result.get("approved"):
        await apply_changes(app_id, queries, components)
        return {"success": True, "diff": diff}
    else:
        return {"success": False, "error": "User rejected the changes"}

The blocking mechanism uses asyncio.Event. When the tool calls wait_for_approval, it awaits an event that gets set when the approval response arrives on the WebSocket:

_pending_approvals: dict[str, asyncio.Event] = {}
_approval_results: dict[str, dict] = {}

def create_pending_approval() -> str:
    approval_id = str(uuid.uuid4())
    _pending_approvals[approval_id] = asyncio.Event()
    return approval_id

async def wait_for_approval(approval_id: str, timeout: float = 300.0) -> dict:
    event = _pending_approvals[approval_id]
    try:
        await asyncio.wait_for(event.wait(), timeout=timeout)
        return _approval_results.pop(approval_id)
    except asyncio.TimeoutError:
        return {"approved": False, "error": "Approval timed out"}

def resolve_approval(approval_id: str, approved: bool):
    _approval_results[approval_id] = {"approved": approved}
    _pending_approvals[approval_id].set()  # Unblock wait_for_approval

On the frontend, the approval_required event renders a diff showing exactly what queries and components will be added, modified, or removed. The user clicks Accept or Reject, which sends the approval_response back through the WebSocket chain.

// When approval_required arrives
if (data.type === "approval_required") {
  setPendingApproval({
    approvalId: data.approvalId,
    changes: data.changes,
    diff: data.diff,
  });
}

// User's decision
function handleApproval(approved: boolean) {
  ws.send(JSON.stringify({
    type: "approval_response",
    approvalId: pendingApproval.approvalId,
    approved,
  }));
}

This is the pattern that SSE couldn’t support cleanly. With WebSockets, the approval request and response travel over the same connection, the tool blocks naturally using an asyncio primitive, and there’s no session correlation to manage.

Gotchas

The architecture is straightforward on a whiteboard. The implementation had sharp edges.

Concurrent WebSocket writes need a mutex

gorilla/websocket does not support concurrent writes to a single connection. If the goroutine streaming events from Python is writing text deltas while the main handler tries to forward an approval response, the connection corrupts silently or panics. We wrap every WebSocket connection in a struct with a sync.Mutex:

type wsConn struct {
    conn *websocket.Conn
    mu   sync.Mutex
}

func (w *wsConn) WriteJSON(v interface{}) error {
    w.mu.Lock()
    defer w.mu.Unlock()
    return w.conn.WriteJSON(v)
}

An alternative is funneling all outbound messages through a single channel, but the mutex is simpler and the write frequency is low enough that contention isn’t an issue.

The Python read loop must not block on chat processing

The Python agent’s WebSocket handler reads messages in a loop. When a chat message arrives, it kicks off the agent conversation with Claude, which can take many seconds as tools execute and responses stream back. If the handler processes the chat synchronously, the read loop blocks—and when an approval_response arrives, nobody’s listening.

The fix is to run the chat handler as a background asyncio.Task so the read loop stays free to receive approval responses:

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    chat_task: asyncio.Task | None = None

    while True:
        raw = await websocket.receive_text()
        msg = json.loads(raw)

        if msg["type"] == "chat":
            # Run as a background task so approvals can still be received
            chat_task = asyncio.create_task(handle_chat(websocket, msg))
        elif msg["type"] == "approval_response":
            # This arrives while chat_task is awaiting approval
            resolve_approval(msg["approvalId"], msg["approved"])

Without this, the agent hangs forever waiting for an approval that the WebSocket handler can’t receive. This was one of the subtler bugs we hit—everything worked until the first tool that needed approval, at which point the entire session deadlocked.

Session persistence across reconnections

If the WebSocket drops mid-conversation (network blip, user navigates away and comes back), the frontend needs to resume where it left off. We persist the Claude session ID to the database when the Python agent reports it:

if event.Type == "session_update" && event.ClaudeSessionID != "" {
    DB.Exec(
        `UPDATE ai_chats SET claude_session_id = $1 WHERE id = $2`,
        event.ClaudeSessionID, chatID,
    )
}

On reconnect, Go loads the session ID from the database and passes it to the Python agent, which uses it to resume the Claude conversation. This avoids replaying the entire conversation history on every reconnection.

Reconnection with exponential backoff

The frontend reconnects automatically when the WebSocket closes unexpectedly. We use exponential backoff capped at 30 seconds to avoid hammering the backend during an outage:

ws.onclose = () => {
  if (!cancelled) {
    const delay = Math.min(1000 * 2 ** attempt, 30000);
    attempt++;
    reconnectTimeout = setTimeout(connect, delay);
  }
};

Message accumulation before saving

The Go backend accumulates streamed text as it arrives and only persists it to the database at transition points—before a tool call, after a tool result, or when the agent turn completes. This avoids writing to the database on every text delta (which would be hundreds of writes per response) while ensuring the conversation history is saved correctly:

var fullResponse strings.Builder

case "stream":
    fullResponse.WriteString(event.Content)
    frontend.WriteJSON(event) // Stream to browser immediately

case "tool_use":
    // Save accumulated text before the tool call
    if text := fullResponse.String(); text != "" {
        saveMessage(chatID, "assistant", text)
        fullResponse.Reset()
    }

case "complete":
    // Save any remaining text
    if text := fullResponse.String(); text != "" {
        saveMessage(chatID, "assistant", text)
    }

Wrapping up

SSE works when data flows one direction. The moment your agent needs human approval for tool calls—which it should, for anything destructive—you need bidirectional communication between the browser and your backend. WebSockets are the natural fit.

Our final architecture is a three-tier WebSocket chain: React frontend to Go backend to a Python agent running the Claude Agent SDK. Go handles authentication, persistence, and WebSocket management. Python handles the agentic loop and tool orchestration. The approval flow uses asyncio.Event to block tool execution until the user responds, with the WebSocket carrying the approval request and response on the same connection.

If you’re building something similar, design for the approval flow from the start. It’s tempting to start with SSE because the data appears unidirectional at first—Claude streams text, you render it. But the first time Claude calls a tool that needs user confirmation, you’ll need a way for the client to talk back. Starting with WebSockets saves you the rewrite we had to do.