Meta launches ASR models for 1,600 languages, including rare Indian dialects

Omnilingual ASR model interface Omnilingual ASR model interface

Meta has introduced a groundbreaking suite of open-weight AI models that support speech-to-text capabilities across more than 1,600 languages. Among these, 500 are considered low-resource languages now receiving transcription support for the first time. The models, known as Omnilingual ASR, were developed by the Fundamental AI Research team. Alongside these, Meta launched a multilingual speech representation model called Omnilingual wav2vec 2.0, which can scale up to seven billion parameters to help developers build a variety of speech-based AI tools.

The suite covers numerous Indian languages such as Hindi, Marathi, Telugu, Malayalam, Odia, and Punjabi. It also extends to lesser-spoken dialects including Kui, Awadhi, Maithili, Chattisgarhi, and Bagheli. By making transcribed speech datasets in 350 underserved languages publicly available through its Omnilingual ASR Corpus, Meta aims to democratize access to multilingual AI research.

This move arrives as Indian AI startups accelerate the creation of Indic language models under national initiatives like Mission Bhashini. These government-backed programs promote linguistic diversity and encourage innovation in local language technologies. However, many startups still face steep competition from major AI firms expanding their presence in India’s fast-growing digital landscape.

Addressing Challenges in Low-Resource Languages

Even with growing investment, training effective speech models for long-tail languages remains difficult. Because such languages are rarely represented online, developers often struggle with data scarcity. “This means high-quality transcriptions are often unavailable for speakers of less widely represented or low-resource languages, furthering the digital divide,” Meta explained in a blog post.

To solve this, Meta designed its Omnilingual ASR system as a community-centered framework. It allows speakers to add new languages by submitting a handful of their own audio-text samples. “In practice, this means that a speaker of an unsupported language can provide only a handful of paired audio-text samples and obtain usable transcription quality without training data at scale, onerous expertise, or access to high-end compute,” the company said. This approach encourages wider participation and faster language inclusion.

The Omnilingual wav2vec 2.0 Model and Open Data Efforts

Under a permissive Apache 2.0 license, Meta released its self-supervised multilingual speech model (LLM-ASR). “First, we scaled our previous wav2vec 2.0 speech encoder to 7B parameters for the first time, producing rich, massively multilingual semantic representations from raw, untranscribed speech data,” the company noted. It added that two decoder variants were built — one employing a traditional connectionist temporal classification objective, and another using a standard transformer decoder commonly found in large language models.

Khazna Launches NexOps for Hyperscale Operations

The LLM-ASR model achieved character error rates below 10 for over 78 percent of the supported languages, signaling strong performance across diverse linguistic systems. Furthermore, the Omnilingual ASR Corpus was assembled in partnership with local organizations that recruited and compensated native speakers, often in under-documented regions. Meta also collaborated with linguists and community initiatives such as Common Voice to make these datasets openly available under a CC-BY license. This accessibility enables researchers and developers worldwide to design inclusive voice technologies with genuine cultural relevance.

Recent reports suggest Meta is experimenting with culturally tailored, Hindi-speaking chatbots by collaborating with local residents and artists to add regional authenticity. Through such steps, Meta continues expanding its vision for globally inclusive AI while encouraging shared progress in multilingual technology.