Google Adds Efficient Gemini 3.5 Flash Mode

Google has added a Low thinking level for Gemini 3.5 Flash as the recommended default model in its Antigravity developer platform. The update aims to reduce token consumption while maintaining strong performance for common development tasks.

The change follows the launch of Gemini 3.5 Flash at Google I/O 2026. According to internal testing, the Low setting uses about 45% fewer tokens than the Medium configuration. At the same time, it continues to outperform the earlier Gemini 3 Flash model on software engineering benchmarks.

As developers increasingly adopt agentic AI workflows, efficient token usage has become a critical concern. Therefore, Google is focusing on balancing performance, speed, and operational costs across its AI ecosystem.

Lower Consumption Eases Developer Constraints

Gemini 3.5 Flash introduced configurable reasoning levels, including Minimal, Low, Medium, and High. Consequently, developers can adjust the model’s reasoning depth based on task requirements.

However, early testing revealed that the model consumed substantial token volumes during complex agentic workflows. As a result, developers operating under strict usage limits faced challenges managing costs and quotas.

Meanwhile, Google has adjusted its platform policies to improve accessibility. The company reset quotas across free and paid plans and introduced a new AI Ultra subscription tier for advanced users. In addition, Google expanded platform capacity during certain periods to help reduce usage bottlenecks.

Abu Dhabi’s MGX Closes $49 Billion AI Fund, One of the World’s Largest

Google Balances Performance and Accessibility

Despite the lower reasoning setting, Gemini 3.5 Flash retains its one-million-token context window and supports outputs of up to 65,536 tokens. Therefore, the model remains suitable for long-running workflows, large codebases, and complex AI-driven tasks.

The update reflects Google’s broader effort to make advanced AI tools more accessible without compromising capability. Furthermore, the company continues to refine resource allocation as demand for agentic AI platforms grows.

By making the Low thinking level the default option, Google seeks to deliver stronger efficiency while preserving the benefits of its latest Flash model. Consequently, developers can access advanced AI capabilities with lower token consumption and improved quota management.