On May 7 (U.S. time), OpenAI unveiled three new Realtime speech models at its developer conference: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper, all available to developers via the Realtime API. In an official announcement, OpenAI said GPT-Realtime-2 is its first speech model with GPT-5–level reasoning capability, able to perform real-time inference in voice conversations, call tools, handle revisions, and maintain a natural conversational cadence.
GPT-Realtime-2: context up from 32K to 128K, five-stage reasoning strength adjustable
Core upgrades for GPT-Realtime-2:
context window: 32K to 128K tokens
Adjustable reasoning strength: minimal, low, medium, high, xhigh (five stages)
Big Bench Audio test: high reasoning at 96.6%, versus the prior GPT-Realtime-1.5 at 81.4%
Audio MultiChallenge instruction following: xhigh reasoning at 48.5%, versus the prior 34.7%
A larger context and adjustable reasoning strength let developers switch between “cheap and fast” and “deep thinking” based on the scenario—simple customer service can use minimal mode to control costs, while complex tasks can switch to xhigh for GPT-5–level reasoning quality.
Two specialized models are released alongside it: Translate for cross-language, and Whisper for real-time transcription.
This round’s three new models are split by function:
GPT-Realtime-Translate: real-time multilingual voice translation, supports 70 input languages and 13 output languages
GPT-Realtime-Whisper: low-latency streaming transcription, outputs text as people speak, suitable for live captions, meeting notes, and classroom verbatim transcripts
GPT-Realtime-2: a full dialogue agent—able to reason, use tools, and carry out actions
Translate and Whisper are specialized for specific speech applications—translation and transcription are more latency- and cost-sensitive than general dialogue, so using separate models can optimize their respective metrics.
Pricing: GPT-Realtime-2 is $32 per million input, and $64 per million output
Pricing structure for the three models:
GPT-Realtime-2: $32 per million voice input, cached input $0.40, output $64 per million
GPT-Realtime-Translate: $0.034 per minute
GPT-Realtime-Whisper: $0.017 per minute
Specific follow-up events to watch: GPT-Realtime-2’s real-world adoption of speech agents in production environments, the extent of cannibalization versus existing GPT-4o speech models, and how peers like Anthropic and Google respond in comparison.
This article, “OpenAI brings GPT-Realtime-2: brings GPT-5 reasoning into voice agents, context upgraded to 128K,” first appeared on LianNews ABMedia.
Related News
NVIDIA releases Nemotron 3 Nano Omni open-source multimodal model
OpenAI DevDay 2026 will be held in San Francisco on 9/29
OpenAI launches ChatGPT Futures: 26 inaugural students receive $10k in funding, spanning more than 20 universities
OpenAI Unveils the MRC Supercomputer Network Protocol! Teaming Up with NVIDIA, AMD, and Microsoft to Build the Stargate Infrastructure
ChatGPT launches Excel and Google Sheets: GPT-5.5 logs in directly to spreadsheets, with a three-way showdown between Copilot and Gemini