Security

Your WebSocket Is an Open Door

The real-time stream carries every tool result, every status update, and accepts commands to cancel runs and answer agent questions. Conversation UUIDs are not secrets. Without cryptographic auth and org-ownership verification, the WebSocket is read-write access to every running agent.

Jyhad Aamri, Architect of Decision Systems9 min read

The Most Sensitive Channel in the Platform

When an AI agent runs, a real-time stream carries the output to the frontend. Token deltas as the LLM generates text. Tool call starts and completions. Status updates. Plan proposals. Approval requests. Every event the agent produces flows through this stream.

In most platforms, this is a WebSocket connection. The client connects, the server pushes events, and the user watches the agent work in real time.

But the WebSocket is not read-only. It is bidirectional. The client can send commands back to the server:

Cancel a running agent. A cancel message tells the server to abort the active agent run. The agent stops. Whatever it was doing is interrupted.

Respond to agent questions. When an agent asks the user a question (approval requests, clarification, decisions), the response goes back through the WebSocket. The agent acts on that response. Whatever the response says, the agent treats as user input.

This is not a passive data stream. It is an interactive control channel. Read access exposes every tool result, including results that may contain sensitive business data. Write access controls agent behavior: what it does, when it stops, and what decisions it makes.

The question is: who can connect?

Conversation UUIDs Are Not Secrets

Most WebSocket implementations key on the conversation identifier. The client connects to a URL that includes the conversation ID. The server looks up the conversation and starts streaming events.

The implicit assumption: if you know the conversation ID, you are authorized to access it.

This assumption is wrong. Conversation IDs are not secrets. They appear in:

Browser URLs. The conversation page URL contains the ID. Anyone who sees the URL (screen share, screenshot, shoulder surfing, browser history on a shared machine) has the conversation ID.

API responses. List endpoints return conversation metadata including IDs. Any client with API access can enumerate conversation IDs.

Browser history. Every conversation the user has visited is in their browser history. On a shared machine, the next user can see every conversation ID the previous user accessed.

Log files. Server logs, access logs, and application logs frequently include conversation IDs for debugging and correlation. Anyone with log access has conversation IDs.

Referrer headers. If the conversation page links to an external resource, the Referrer header may include the conversation URL with the ID.

A conversation ID is an identifier, not a credential. Treating it as a credential means that every URL share, every browser history entry, every log line, and every API response is a potential authentication bypass. The attack surface is not "someone guesses a UUID." The attack surface is "someone sees a URL."

If knowing the conversation ID is sufficient to connect, then every shared URL is an access token.

What an Unauthenticated WebSocket Exposes

An attacker who connects to an unauthenticated WebSocket for an active conversation can:

Read all agent output in real time. Every token the LLM generates. Every tool result. Every status update. If the agent is querying a CRM, the attacker sees the CRM data. If the agent is reading emails, the attacker sees the email content. If the agent is generating a report, the attacker sees the report as it is written.

Cancel the agent mid-run. A cancel command aborts the agent. If the agent is in the middle of a critical workflow (processing a customer refund, updating a database, completing a migration), the cancellation leaves the workflow in a partial state. This is a denial of service attack that can cause data inconsistency.

Hijack agent decisions. When an agent asks the user a question ("Should I proceed with the refund?" "Which priority level for this ticket?" "Approve this API call?"), the response comes through the WebSocket. An attacker who sends a response before the legitimate user does controls the agent's next action. The agent cannot distinguish between the real user's response and the attacker's. It acts on whichever arrives first.

Monitor activity patterns. Even without reading the content, the stream of events reveals when agents are active, what tools they use, how long runs take, and how frequently they execute. This is operational intelligence that reveals business processes and workflow patterns.

Four attack classes. All from a single unauthenticated WebSocket connection. All enabled by the assumption that conversation IDs are secrets.

The Fix: Cryptographic Auth with Org-Ownership Verification

HeartBeatAgents requires cryptographic authentication on every WebSocket connection. The authentication is not based on the conversation ID. It is based on a signed token that proves two things: who the user is, and which organization they belong to.

The browser WebSocket API does not support custom HTTP headers. You cannot send an Authorization header on a WebSocket upgrade request from the browser. This is a well-known limitation of the browser API. The standard solution is to pass the authentication token as a query parameter on the connection URL.

The token is a signed JWT (JSON Web Token) with a verified signature and an expiration time. It cannot be forged. It cannot be modified. It expires. And it carries the user's organization identity in its payload.

Five-Step Validation

Every WebSocket connection passes through five validation steps before a single event is streamed. If any step fails, the connection is closed immediately. No events are sent. No commands are accepted.

// WebSocket authentication flow

          1
          Extract token from connection request
        
Missing? → Connection refused
↓

          2
          Verify cryptographic signature + expiration
        
Invalid or expired? → Connection refused
↓

          3
          Extract organization identity from token
        
Missing org identity? → Connection refused
↓

          4
          Verify conversation belongs to this organization
        
Different org or not found? → Connection refused
↓

          5
          Accept connection. Begin event relay.
        

Step 1: Token extraction. The signed token is extracted from the connection request. If no token is present, the connection is refused immediately. There is no anonymous mode. There is no "read-only without auth" fallback. No token, no connection.

Step 2: Signature and expiry verification. The token's cryptographic signature is verified against the server's signing key. If the signature is invalid (token was forged or modified), the connection is refused. If the token has expired, the connection is refused. The token cannot be reused indefinitely. It has a finite lifetime.

Step 3: Organization identity extraction. The token carries the user's organization identity in its payload. This is not a user-supplied parameter. It is embedded in the signed token at authentication time. The user cannot modify it. They cannot claim to belong to a different organization. The organization identity is as trustworthy as the token signature.

Step 4: Ownership verification. The conversation ID from the URL is checked against the database. Does this conversation belong to the organization in the token? If the conversation belongs to a different organization, the connection is refused. If the conversation does not exist, the connection is refused. This is the multi-tenant isolation check. It runs on every connection, not just the first one.

Step 5: Accept. Only after all four checks pass does the server accept the WebSocket upgrade and begin streaming events. The first event the client receives is proof that all validation passed.

Multi-Tenant Isolation

Step 4 is the most critical. It is the difference between "authenticated user" and "authorized user." Authentication proves who you are. Authorization proves you have access to this specific resource.

// Multi-tenant isolation
Org A (valid token)

            Connects to Org A conversation
            →
            Accepted (ownership verified)
          

            Connects to Org B conversation
            →
            Refused (wrong organization)
          
Org B (valid token)

            Connects to Org B conversation
            →
            Accepted (ownership verified)
          

            Connects to Org A conversation
            →
            Refused (wrong organization)
          
No token

            Connects to any conversation
            →
            Refused (no authentication)
          

A valid token for Organization A grants access to Organization A's conversations. It grants zero access to Organization B's conversations. The token is valid. The signature checks out. The user is authenticated. But they are not authorized for that specific conversation because the conversation belongs to a different organization.

This prevents the most dangerous multi-tenant attack: a legitimate user of one organization accessing another organization's data. The attacker does not need to forge a token. They have a real one. They are a real user. They just belong to the wrong organization. Without the ownership check, their valid authentication would grant them access to every conversation on the platform.

The ownership check queries the database on every connection. It is not cached. It does not rely on the token's claims alone (which could be stale if a conversation was transferred between organizations). The source of truth is the database, checked at connection time.

Write Operations Through Message Queues

When an authenticated client sends a command through the WebSocket (cancel a run, respond to a question), the command does not go directly to the agent process. It goes through a message queue.

This is a deliberate architectural decision. The WebSocket handler is an edge process. It faces the internet. It accepts connections from browsers. It is the most exposed component in the system. Giving it direct access to agent processes would mean that a compromise of the WebSocket handler is a compromise of the agent runtime.

Instead:

Cancel commands are published to the message queue with the run ID. The agent's runner process polls the queue and checks for cancel signals. The WebSocket handler never touches the runner process. It drops a message in a queue and confirms receipt to the client.

Input responses (answers to agent questions) are published to the message queue with the conversation and input identifiers. The runner process picks up the response when it is ready. The WebSocket handler does not know where the runner is, what process it runs in, or how to reach it. It knows how to publish to a queue. That is the extent of its access.

This separation means a compromised WebSocket handler can publish messages to queues but cannot directly execute code in agent processes, access the database as the agent, or modify agent state outside of the defined message protocol. The blast radius of a WebSocket compromise is limited to the operations the message protocol supports: cancel and input response. Nothing else.

Token Expiry and Reconnection

Signed tokens expire. This is intentional. A token that never expires is a permanent credential. If it is intercepted (network sniffing, log exposure, browser history), the attacker has permanent access.

Token expiry creates a practical problem: agent runs can last longer than the token lifetime. A user starts watching an agent run. The token expires mid-run. The WebSocket connection is dropped.

HeartBeatAgents handles this with a three-step reconnection flow:

Step 1: Server closes the connection with a policy violation code. This is a standard WebSocket close code that specifically indicates an authentication or authorization failure. It is distinct from network errors, server errors, or normal closure. The client can identify exactly why the connection was dropped.

Step 2: Client recognizes the code and refreshes the token. The frontend's connection handler detects the policy violation code and triggers an automatic token refresh. A new signed token is obtained from the authentication service. No user interaction required.

Step 3: Client reconnects with the fresh token. The new connection goes through all five validation steps again. If the token is valid and the conversation still belongs to the user's organization, the connection is accepted and event streaming resumes.

If reconnection fails (the refresh token has also expired, the user's session has ended, the conversation was deleted), the client uses exponential backoff: 1 second, 2 seconds, 4 seconds, up to a maximum of 30 seconds between attempts. This prevents reconnection storms where thousands of clients simultaneously hammering the server after a brief outage.

Fail-Closed in Production

In development, WebSocket connections can be accepted without a token to simplify local testing. This is logged as a warning. It is visible in development logs. It exists because requiring JWT infrastructure for local development creates unnecessary friction.

In production, this path does not exist. A missing token is an immediate connection refusal. There is no "accept and log" mode. There is no "read-only without auth" degradation. The system distinguishes between development and production with a single environment flag, and the behavior difference is absolute: development accepts with a warning, production refuses without discussion.

This fail-closed behavior extends to every validation step. An invalid signature does not degrade to "allow read-only." A missing organization identity does not degrade to "allow if conversation is public." A failed ownership check does not degrade to "allow if the user is an admin." Every failure mode results in the same outcome: connection refused. No partial access. No degraded access. No access.

A WebSocket that degrades gracefully under authentication failure is a WebSocket that provides unauthenticated access with extra steps.

What to Audit in Your Platform

If your AI agent platform uses WebSockets for real-time streaming, ask these questions:

"What happens if I connect to the WebSocket with just a conversation ID and no authentication token?" If events start streaming, every conversation ID is an access token. Every URL that has ever been shared, logged, or displayed is a potential unauthorized access point.

"Can a user from Organization A connect to Organization B's conversation WebSocket?" If the answer is "no, because they would not know the conversation ID," that is security by obscurity. UUIDs are identifiers, not credentials. If the answer is "no, because the server verifies that the conversation belongs to the token's organization," that is authorization.

"What can an authenticated WebSocket client do beyond reading events?" If the WebSocket only streams events (read-only), the exposure is data access. If the WebSocket also accepts commands (cancel, input response, configuration changes), the exposure is data access plus agent control. The authentication requirements should match the risk: a bidirectional control channel demands stronger auth than a read-only event stream.

"What happens when the authentication token expires during a WebSocket session?" If the connection stays open, the token expiry is meaningless. The user authenticated once and has permanent access for the duration of the connection. If the connection is closed and the client must reconnect with a fresh token, the expiry is enforced. Every reconnection re-validates authentication and organization ownership.

The WebSocket is the most sensitive real-time channel in an AI agent platform. It carries everything the agent produces and accepts commands that control what the agent does. Treating it as a simple event pipe that only needs a conversation ID is treating your most sensitive channel as your least protected one.