According to Beating, OpenAI released three voice models in its Realtime API: GPT-Realtime-2 for voice conversation with reasoning, GPT-Realtime-Translate for real-time translation, and GPT-Realtime-Whisper for streaming transcription. GPT-Realtime-2 is OpenAI's first voice model with GPT-5-level reasoning capability, expanding context window from 32K to 128K tokens, supporting up to 1-2 hours of dense conversation.
GPT-Realtime-2 improved 15.2% on Big Bench Audio benchmark and 13.8% on Audio MultiChallenge compared to GPT-Realtime-1.5. GPT-Realtime-Translate supports 70+ input languages translating to 13 output languages. Pricing: GPT-Realtime-2 at $32/million input tokens and $64/million output tokens; Translate at $0.034/minute; Whisper at $0.017/minute.