Nvidia’s Llama-3.1 Nemotron Ultra Beats DeepSeek R1 Efficiently

Nvidia’s Nemotron Ultra model surpasses DeepSeek R1 in performance and size.

Nvidia has introduced its newest large language model, Llama-3.1-Nemotron-Ultra-253B, and it’s turning heads. Built on Meta’s Llama-3.1-405B-Instruct, this open-source model is compact yet powerful. Despite having less than half the parameters of competitors like DeepSeek R1, it’s outperforming them in several key benchmarks.

Unveiled during Nvidia’s GTC conference and officially released on April 7, the model is now publicly available on Hugging Face. It supports two operating modes—reasoning on and off—allowing developers to switch between complex problem solving and simpler tasks. Additionally, the model’s dense 253-billion parameter design has been optimized for inference, making it cost-effective and scalable.

What truly sets Nemotron Ultra apart is its architectural innovation. Nvidia employed Neural Architecture Search (NAS) to introduce elements like fused feedforward networks and skipped attention layers. As a result, the model reduces memory load and performs efficiently on a single 8x H100 GPU node. Moreover, it supports newer architectures like B100 and Hopper, further widening its usability.

Post-Training Drives Smarter Performance

Nvidia didn’t stop at a smart design. The model went through a rigorous post-training process, combining supervised fine-tuning with reinforcement learning using Group Relative Policy Optimization (GRPO). This helped boost its instruction-following and reasoning capabilities.

It also underwent knowledge distillation across 65 billion tokens, followed by continual pretraining on an additional 88 billion. Datasets like FineWeb and Dolma contributed to its training, while synthetic prompts helped fine-tune its reasoning toggles.

Thanks to this layered training, Nemotron Ultra achieved standout results. For example, on the MATH500 benchmark, its reasoning mode reached 97.00%, nearly matching DeepSeek R1’s 97.3%. On LiveCodeBench, it scored 66.31%, slightly outperforming its larger rival.

Al-Futtaim and Johnson Controls Launch UAE Smart Platform

Designed for Developers and Real-World Use

The model is developer-friendly and supports sequences up to 128,000 tokens. It integrates with Hugging Face Transformers (version 4.48.3) and offers multilingual capabilities, including support for Hindi, Spanish, Thai, and more.

For usage, Nvidia recommends adjusting system prompts to toggle reasoning and selecting decoding methods based on the task—temperature sampling for creativity, greedy decoding for consistency.

Released under the Nvidia Open Model License, the model is also cleared for commercial use. Nvidia urges users to test for safety, bias, and alignment in their applications. With its open weights and toggle-based reasoning, Nemotron Ultra is set to become a practical, high-performance alternative in today’s LLM landscape.