Now Reading
OpenAI Unveils GPT-5-Level Real-Time Voice Models

OpenAI Unveils GPT-5-Level Real-Time Voice Models

OpenAI real-time voice AI models

OpenAI introduced three new audio models through its Realtime API on Thursday, aiming to make voice-powered applications more intelligent, multilingual, and easier to develop. The new lineup includes GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. Together, the models focus on live reasoning, translation, and transcription during voice interactions.

GPT-5-Level Reasoning Arrives in Voice Applications

GPT-Realtime-2 stands out as the flagship release because it brings GPT-5-class reasoning into real-time voice conversations. The company described it as “our most intelligent voice model yet.” In addition, the model includes a 128,000-token context window, which is four times larger than the 32,000-token limit available in GPT-Realtime-1.5.

The model also supports adjustable reasoning levels ranging from minimal to high. According to benchmark testing, GPT-Realtime-2 scored around 15 percent higher on Big Bench evaluations than its predecessor, which launched in February. As a result, the company positioned the model as a step beyond scripted voice assistants.

OpenAI described the shift as moving toward “real-time collaborators that can listen, reason, and solve complex problems as conversations unfold.”

Translation, Transcription, and Early Testing Results

GPT-Realtime-Translate enables live speech translation from more than 70 input languages into 13 output languages while keeping pace with speakers in real time. Meanwhile, GPT-Realtime-Whisper delivers streaming speech-to-text transcription with adjustable latency settings. Lower latency produces faster partial text, whereas higher latency improves transcription accuracy.

Pricing starts at $32 per million audio input tokens for GPT-Realtime-2. In comparison, GPT-Realtime-Translate costs $0.034 per minute, while GPT-Realtime-Whisper costs $0.017 per minute.

See Also
Reachy Mini desktop AI robots

Several companies joined the early testing program and reported measurable improvements. Zillow reported a 26-point increase in call success rates using GPT-Realtime-2, reaching 95 percent compared to 69 percent with earlier systems. Likewise, BolnaAI reported a 12.5 percent reduction in word error rates when testing GPT-Realtime-Translate for Hindi, Tamil, and Telugu.

The API also includes safety systems such as real-time classifiers that end conversations violating content standards. Furthermore, the service complies with European Union data regulations.

View Comments (0)

Leave a Reply

Your email address will not be published.

© 2024 The Technology Express. All Rights Reserved.