OpenAI Launches GPT-Realtime-2 with GPT-5-Class Reasoning for Live Voice Apps

OpenAI launched three new realtime voice models on May 7, 2026, headlined by GPT-Realtime-2 — the company's first voice model with GPT-5-class reasoning. The release also introduces GPT-Realtime-Translate for live multilingual speech and GPT-Realtime-Whisper for ultra-low-latency streaming transcription, all available through the Realtime API and Playground.

Key Highlights

GPT-Realtime-2 quadruples its context window to 128K tokens (up from 32K) and supports parallel tool calls, adjustable reasoning effort, and controllable tone delivery
GPT-Realtime-Translate handles live translation from over 70 input languages into 13 output languages, keeping pace with the speaker
GPT-Realtime-Whisper transcribes audio as people speak, targeting live captions and meeting notes
Pricing starts at $0.017 per minute for Whisper and $0.034 per minute for Translate
Zillow and Deutsche Telekom are early production adopters

Details

GPT-Realtime-2 is priced at $32 per million audio input tokens (with cached input at $0.40) and $64 per million audio output tokens. According to OpenAI, that translates to roughly $0.30 per minute for a typical conversation before caching benefits.

The model is built for production voice agents — it handles tool calls, recovers from corrections and interruptions gracefully, and now reasons mid-conversation rather than only at turn boundaries. OpenAI reports a 15.2 percent jump on the Big Bench Audio benchmark over the previous generation.

GPT-Realtime-Translate covers more than 70 input languages and 13 output languages, making it suitable for cross-border customer support, live captioning, and language-learning use cases. GPT-Realtime-Whisper, the streaming successor to the original Whisper, is positioned as the lowest-cost option in the lineup at under two cents per minute.

Impact

The release reframes voice AI from "press button, get answer" into "agent listens, reasons, and acts mid-conversation." For developers, it means a single API surface can now power call-center automation, live translation, and meeting transcription — three workloads that previously required stitching together separate vendors.

The pricing is the line that should worry incumbents. Translation under four cents per minute undercuts traditional human interpretation services and many specialized translation APIs. Streaming transcription at $0.017 per minute pressures established providers in the captioning and meeting-notes market.

Background

Zillow, the U.S. real-estate marketplace, is already running GPT-Realtime-2 in production for home tour interactions, reporting improvements in call success rates and compliance robustness. Deutsche Telekom is testing GPT-Realtime-Translate for cross-language customer support across its European footprint.

OpenAI also confirmed enterprise-grade safeguards: active safety classifiers to filter harmful content, EU data residency support, and standard enterprise privacy commitments. The models are available for experimentation in the OpenAI Playground today.

What's Next

The shift toward voice-first AI agents is accelerating. Google's Gemini 3.1 Flash Live, Mistral's Voxtral, and now OpenAI's three-model lineup signal that the next generation of consumer and enterprise apps will be voice-native by default. Expect a wave of specialized voice agents — for sales, support, healthcare intake, and field service — built on top of the Realtime API in the coming quarters.

For Arabic and French speakers in the MENA region, the 70+ input languages on GPT-Realtime-Translate also opens the door to dialect-aware voice products without the heavy data-collection burden that previously gated these markets.

Source: OpenAI