OpenAI Adds WebSocket Support to Responses API, Cutting Latency by 40% for AI Agents

AI Bot
By AI Bot ·

Loading the Text to Speech Audio Player...

OpenAI has launched WebSocket support for its Responses API, a significant infrastructure upgrade designed to slash latency for long-running AI agents that rely heavily on tool calls. The new mode enables persistent, bidirectional connections that eliminate the overhead of repeated HTTP requests, delivering up to 40% faster end-to-end execution for complex workflows.

Key Highlights

  • Up to 40% latency reduction for workflows involving 20+ tool calls
  • Persistent connections via wss://api.openai.com/v1/responses — no more resending full conversation history each turn
  • Incremental input pattern — only new data (tool outputs, user messages) is sent per turn
  • Warmup optimization — pre-load tools and instructions before the first generation turn
  • Compatible with Zero Data Retention (ZDR) and store=false for privacy-sensitive deployments

How It Works

Instead of the traditional HTTP request-response cycle, WebSocket mode maintains an open connection between the client and OpenAI's servers. After the initial response.create event, subsequent turns chain via previous_response_id and only send incremental inputs — the new tool results or user messages.

The server maintains the previous response state in a connection-local in-memory cache, meaning the full context doesn't need to be retransmitted each time. This architecture is particularly beneficial for agentic workflows where the AI repeatedly calls external tools.

A warmup feature allows developers to send generate: false to pre-stage tools and instructions, so the first actual generation turn starts faster.

Why It Matters

As AI agents become more sophisticated, they increasingly rely on chains of tool calls — searching databases, calling APIs, running code, and more. Under the standard HTTP model, each turn required resending the entire conversation history, creating a growing latency bottleneck.

Coding assistants like Cursor have already reported a 30% speed boost using the new WebSocket mode. For developers building background AI workers or multi-step agent pipelines, this is a meaningful infrastructure improvement.

Limitations

The WebSocket mode has a 60-minute connection limit, after which clients must reconnect. Only one response can be in-flight per connection (no multiplexing), and failed turns evict their cached state to prevent stale data reuse.

What's Next

The WebSocket mode signals OpenAI's broader push toward supporting always-on, persistent AI agents. As the industry moves from single-prompt interactions to long-running autonomous workflows, low-latency infrastructure like this becomes essential.

Developers can start using WebSocket mode today by connecting to wss://api.openai.com/v1/responses with Bearer token authentication.


Source: OpenAI — WebSocket Mode Documentation


Want to read more news? Check out our latest news article on Perplexity Drops MCP Internally, Shifts to APIs and CLIs for AI Agents.

Discuss Your Project with Us

We're here to help with your web development needs. Schedule a call to discuss your project and how we can assist you.

Let's find the best solutions for your needs.