TOOL UPDATES

Google Releases Gemini 3.1 Pro with Double the Reasoning Power and Top Benchmark Scores

M megaone_admin Mar 26, 2026 2 min read
Engine Score 8/10 — Important

This story is important due to the release of Google's updated flagship AI model, Gemini 3.1 Pro, and its reported strong performance in benchmarks, which significantly impacts the competitive landscape and developer choices. Although from a reputable secondary source, direct confirmation from Google or independent evaluators would enhance its verification.

Editorial illustration for: Google Releases Gemini 3.1 Pro with Double the Reasoning Power and Top Benchmark Scores

Google released Gemini 3.1 Pro on February 19, 2026, delivering more than double the reasoning capability of its predecessor on the ARC-AGI-2 benchmark. The model scored 77.1 percent on ARC-AGI-2 compared to Gemini 3 Pro’s 31.1 percent, while achieving 94.3 percent on GPQA Diamond, a test of graduate-level scientific knowledge. It maintains the same pricing as Gemini 3 Pro at $2.00 per million input tokens.

The benchmark results position Gemini 3.1 Pro as a serious contender across multiple dimensions. On LiveCodeBench Pro, it achieved an Elo rating of 2887, outperforming GPT-5.2’s score of 2393. On SWE-Bench Verified, which tests real-world software engineering tasks, the model passed at 80.6 percent — competitive with Claude Opus 4.6’s 80.8 percent at less than half the cost. On Humanity’s Last Exam, Gemini 3.1 Pro scored 44.4 percent, beating both Claude Opus 4.6 at 40.0 percent and GPT-5.2 at 34.5 percent.

The model supports an input context window of 1,048,576 tokens and can generate outputs up to 65,536 tokens, giving it the capacity to process book-length documents and produce detailed analyses in a single pass. For developers, it is accessible through the Gemini API in Google AI Studio, Gemini CLI, Google Antigravity, and Android Studio. Enterprise customers can access it through Vertex AI and Gemini Enterprise.

Early enterprise feedback has been positive. JetBrains Director of AI Vladislav Tankov reported a 15 percent quality improvement over previous Gemini versions in code assistance tasks. Databricks CTO Hanlin Tang described best-in-class results on their internal OfficeQA benchmark. Free Gemini users can access the model with usage limits, while Google AI Pro and Ultra subscribers get higher allocation.

The release intensifies the three-way competition between Google, OpenAI, and Anthropic for the frontier AI crown. Each company now has a model that leads on at least some benchmarks: Gemini 3.1 Pro dominates ARC-AGI-2 and LiveCodeBench, Claude Opus 4.6 leads SWE-Bench Verified, and GPT-5.4 recently introduced native computer use with a million-token context window. The practical differences for most users are narrowing even as the benchmark numbers diverge.

Not all feedback has been positive. Some users have reported that Gemini 3.1 Pro feels more mechanical than its predecessor, noting reduced emotional depth and creative flexibility in conversational interactions. This tension between benchmark performance and subjective quality — the difference between a model that scores well on tests and one that feels good to use — remains an unresolved challenge across the industry.

Share

Enjoyed this story?

Get articles like this delivered daily. The Engine Room — free AI intelligence newsletter.

Join 500+ AI professionals · No spam · Unsubscribe anytime

M
MegaOne AI Editorial Team

MegaOne AI monitors 200+ sources daily to identify and score the most important AI developments. Our editorial team reviews 200+ sources with rigorous oversight to deliver accurate, scored coverage of the AI industry. Every story is fact-checked, linked to primary sources, and rated using our six-factor Engine Score methodology.

About Us Editorial Policy