Now Reading
Google Unveils Gemini 3.1 Flash Live Voice Model

Google Unveils Gemini 3.1 Flash Live Voice Model

Voice AI assistant in conversation

Google has introduced Gemini 3.1 Flash Live, a new audio and voice model built for natural, low-latency conversations across platforms. As a result, the model aims to improve interactions in developer tools, enterprise systems, and consumer applications.

Currently, developers can access the model through the Gemini Live API in Google AI Studio, while businesses can use it via Gemini Enterprise for Customer Experience. Meanwhile, consumers can experience it through Search Live and Gemini Live.

Google DeepMind CEO Demis Hassabis described it as “a big leap towards building next-gen voice-first agents”.

Moreover, benchmark results highlight the model’s capabilities. On ComplexFuncBench Audio, it scored 90.8%, while on Scale AI’s Audio MultiChallenge, it achieved 36.1% with “thinking” enabled. In addition, the model demonstrates improved tonal understanding by recognizing pitch, pace, and other acoustic nuances. Therefore, it can adjust responses dynamically when users show frustration or confusion.

At the same time, consumer-facing features have improved significantly. Gemini Live now responds faster and maintains conversation context for twice as long as the previous version. Furthermore, the rollout expands Search Live to more than 200 countries and territories with multilingual support.

Early Enterprise Adoption

Several companies have already begun testing the model in real-world workflows. For instance, Verizon, The Home Depot, and LiveKit have explored its capabilities in customer-facing and operational settings.

A Verizon representative said the audio-to-audio capability made virtual agents sound more natural and removed latency issues when conveying information to customers.

See Also
Digital payment system monitoring dashboard

Similarly, The Home Depot emphasized the model’s ability to capture complex details, including alphanumeric product codes, even in noisy environments. In addition, it supports real-time language switching, which enhances usability across diverse customer interactions.

Safety Features and Release

To address concerns around AI-generated content, the model includes SynthID watermarking in all audio outputs. As a result, this imperceptible marker allows systems to identify AI-generated audio reliably.

The model is now available through Google AI Studio, and its API changelog confirms the release of the gemini-3.1-flash-live-preview identifier on March 26. Consequently, developers and businesses can begin integrating the technology into their applications immediately.

View Comments (0)

Leave a Reply

Your email address will not be published.

© 2024 The Technology Express. All Rights Reserved.