Now Reading
Google Launches Real-Time AI Speech Translation in 70+ Languages

Google Launches Real-Time AI Speech Translation in 70+ Languages

Google AI translation interface

Google has introduced Gemini 3.5 Live Translate, an AI-powered speech translation model that delivers near real-time translations across more than 70 languages. As a result, the technology significantly reduces communication barriers during conversations.

Unlike traditional translation systems that wait for speakers to finish their sentences, Gemini 3.5 Live Translate continuously processes speech and delivers translated audio within seconds. Moreover, it preserves important speech characteristics such as intonation, pacing, and pitch, creating a more natural conversation experience.

The feature is rolling out globally through the Google Translate app on Android and iOS. In addition, developers can access it through a public preview in Gemini Live API and Google AI Studio. Selected business customers will also receive private preview access through Google Meet this month.

Furthermore, the Google Meet integration expands speech translation support from five languages to more than 70. Consequently, users can communicate through over 2,000 language combinations within a single meeting.

Android users will also benefit from a new listening mode. This feature allows people to hear translated speech directly through the phone’s earpiece by holding the device to their ear like a regular call. Meanwhile, all AI-generated audio includes SynthID watermarking to help identify generated content.

DiffusionGemma Brings Faster AI Text Generation

Alongside the translation rollout, Google DeepMind unveiled DiffusionGemma, an experimental open model designed to accelerate text generation. Instead of predicting one word at a time, the model starts with noise and refines blocks of up to 256 tokens simultaneously, similar to how diffusion image models create pictures.

Built on the Gemma 4 architecture, the 26-billion-parameter mixture-of-experts model activates only 3.8 billion parameters during inference. Because of this design, it achieves impressive generation speeds exceeding 1,000 tokens per second on a single Nvidia H100 GPU. Additionally, it delivers roughly 700 tokens per second on a consumer GeForce RTX 5090.

The model weights are available under the Apache 2.0 open-source license through Hugging Face.

See Also
Threads advertising app on smartphone

Google CEO Sundar Pichai highlighted the launch on social media, calling it “a racehorse achieving up to 4x faster inference” that brings the company’s text diffusion research to the Gemma 4 family.

Balancing Speed and Output Quality

Despite its performance advantages, Google noted that DiffusionGemma remains an experimental model. Therefore, it does not yet match standard Gemma 4 models on output quality benchmarks.

The company recommends the model primarily for speed-sensitive local workloads, including inline editing, rapid iteration, and agent-based workflows. However, organizations seeking the highest output quality may still prefer conventional production-ready models.

Meanwhile, Nvidia has optimized DiffusionGemma across its hardware ecosystem, ranging from consumer GPUs to DGX Spark systems. Support is also available from day one in vLLM, Hugging Face Transformers, and Unsloth, making deployment easier for developers and researchers.

View Comments (0)

Leave a Reply

Your email address will not be published.

© 2024 The Technology Express. All Rights Reserved.