DeepSeek Reveals Training Costs of AI Model

Chinese AI developer DeepSeek disclosed that training its reasoning-focused R1 model cost $294,000, far less than figures linked to U.S. rivals. This revelation has reignited debate over China’s position in the global AI race. The company reported that R1 was trained using 512 Nvidia H800 chips, a cluster designed for the Chinese market after U.S. restrictions limited access to more powerful hardware.

The announcement comes months after DeepSeek released lower-cost AI systems in January, which raised global concerns about market disruption. Investors responded strongly, fearing competition against established leaders in the AI industry. Since then, DeepSeek has released a few updates, although it has continued incremental product developments.

Use of Chips and Training Methods

Training large language models requires extensive chip clusters operating for long periods to process vast volumes of data. DeepSeek confirmed that it used A100 GPUs during the preparatory stages of model development before transitioning to H800 chips for the final 80 hours of training. This acknowledgement highlighted the company’s access to advanced resources despite U.S. restrictions.

The firm’s operations also drew attention because it was among the few in China with an A100 supercomputing cluster. This capacity allowed it to attract top AI talent, further strengthening its capabilities in model development. Consequently, the company’s approach to resource management has fueled ongoing discussion among industry observers.

Model Distillation and Data Sources

DeepSeek has defended its reliance on model distillation, a technique enabling one AI system to learn from another. This process lowers costs, reduces energy demands, and expands accessibility of AI technologies. In practice, it used Meta’s open-source Llama model for certain distilled versions of its systems.

Dubai Culture Expands e-Learning Initiative in Partnership with LinkedIn

The company noted that training data for its V3 model contained a significant number of AI-generated responses, which may have indirectly transferred knowledge from other advanced models. While this overlap was described as incidental, it continues to raise questions about the boundaries between proprietary and open-source AI development.