Abu Dhabi-based G42 has released a major upgrade to its open-source Hindi-English large language model, NANDA. With 87 billion parameters, the new version sets a new benchmark for Hindi-centric AI models. Moreover, it becomes the largest Hindi-focused model available with open weights. As a result, the release marks a significant advance in regional language AI capabilities.
Built on Llama-3.1 70B, the upgraded model was trained on a curated dataset containing more than 65 billion Hindi tokens. In addition, a custom Hindi-centric tokenizer improves efficiency and reduces both training and inference time. Consequently, the model delivers stronger performance while operating at greater scale. The model was developed by Mohamed bin Zayed University of Artificial Intelligence in collaboration with Inception and Cerebras.
Designed for real-world Hindi use
The model is engineered for practical, real-world applications. Therefore, it supports formal Hindi in Devanagari script, casual conversational Hindi, and Hinglish. In addition, it performs strongly across translation, summarisation, instruction-following, and transliteration tasks. Safety and cultural alignment also remain central to its design. As a result, it produces context-aware and responsible outputs.
India represents a critical market for such innovation, with more than 600 million Hindi speakers and a rapidly expanding digital economy. Moreover, over 80 percent of new internet users prefer local languages. Consequently, Hindi-centric models like NANDA can help bridge digital and linguistic divides at scale.
Training infrastructure and open access
The model was trained on Condor Galaxy, one of the world’s most powerful AI supercomputers for training and inference. Furthermore, the upgraded Hindi LLM is now available as an open-weight model on MBZUAI’s Hugging Face page. As a result, developers, creators, and enterprises can explore its capabilities and build new applications on top of it.








