Google's Gemini API Boosts Coding Success to 96.6%

Google released a new “Agent Skill” for the Gemini API on March 28, 2026, designed to address a structural limitation common to AI coding assistants: the gap between what a model learns at training time and what current APIs and SDKs require. The release was reported by Matthias Bastian at The Decoder. In benchmark testing across 117 tasks, the skill raised Gemini 3.1 Pro Preview’s success rate from 28.2% to 96.6%.

Google’s Agent Skill feeds coding agents real-time information about current Gemini models, SDKs, and sample code at inference time.
Tested across 117 coding tasks, Gemini 3.1 Pro Preview’s success rate rose from 28.2% to 96.6% when the skill was active.
Newer Gemini 3 series models gained significantly more than older 2.5 models; Google attributes the difference to stronger reasoning capabilities in the newer generation.
The skill is available on GitHub; Google is also evaluating MCP services as an additional mechanism for delivering updated API context to agents.

What Happened

Google’s Agent Skill targets a limitation that affects all AI coding assistants: once trained, language models have no awareness of subsequent API changes, SDK updates, or revised best practices. The skill compensates by supplying coding agents with current documentation, model information, and sample code at inference time — replacing reliance on training-time knowledge with context delivered dynamically. Matthias Bastian covered the release at The Decoder on March 28, 2026.

The skill was made publicly available on GitHub following the announcement. It represents Google’s implementation of the Agent Skill concept, a pattern first introduced by Anthropic in late 2025 that other AI companies quickly adopted thereafter.

Why It Matters

The knowledge-gap problem is not unique to Gemini. Any AI coding assistant faces the same challenge when the APIs and SDKs it targets are under active development. When vendors ship new model versions, deprecate methods, or revise recommended patterns, a model trained before those changes may produce code that compiles but fails to follow current conventions — or references methods that no longer exist.

Google’s benchmark numbers put a concrete figure on that cost. Without the skill, Gemini 3.1 Pro Preview — the top performer in the 117-task evaluation — succeeded on fewer than three in ten tasks. With the skill enabled, it succeeded on more than nineteen in twenty.

Technical Details

Google evaluated the Agent Skill across 117 standardized coding tasks, measuring completion rates with and without the skill active. Gemini 3.1 Pro Preview recorded a baseline success rate of 28.2% and a skill-assisted rate of 96.6%, a gain of 68.4 percentage points on the same task set.

The magnitude of improvement varied by model generation. Gemini 3 series models benefited substantially more than older Gemini 2.5 models. Google stated that the smaller gains in 2.5-series models “come down to weaker reasoning abilities,” as reported by Bastian. That finding suggests the skill functions as a context amplifier: it delivers updated information, but models must still reason effectively to apply it.

Mechanically, the skill works by injecting current SDK documentation, model information, and code samples into the coding agent’s context window at inference time. Google has also indicated it is evaluating MCP (Model Context Protocol) services as an alternative architectural approach for delivering the same category of real-time context to agents.

Who’s Affected

Developers using the Gemini API to build AI-assisted coding tools or autonomous coding agents are the primary beneficiaries. The skill is most relevant in workflows where generated code must conform to Google’s current SDK conventions — environments where a model relying on outdated training data is likely to produce deprecated patterns or reference removed methods.

A separate Vercel study, cited in Bastian’s report, proposed a competing approach: providing models with direct instructions through AGENTS.md project-level configuration files. Vercel’s findings indicated this file-based method could outperform dynamic skill injection, though no direct benchmark comparison between the two approaches was available at the time of reporting.

What’s Next

The Agent Skill is currently available on GitHub. Google has indicated it is exploring MCP services as an additional delivery channel for real-time API context, though no timeline or comparative benchmarks for that approach were published alongside the initial announcement.

The variation in performance gains across model generations leaves open whether 2.5-series models can be made to benefit more from skill-based context through alternative skill architectures or prompt engineering. Author details for the underlying Google research were not available at time of publication.

Google Agent Skill Raises Gemini 3.1 Pro Coding Rate from 28% to 96.6%

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Enjoyed this story?

Google Agent Skill Raises Gemini 3.1 Pro Coding Rate from 28% to 96.6%

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Enjoyed this story?

Anthropic Splits Claude Subscriptions: Programmatic Use Gets Separate Budget at Full API Rates

WhatsApp Adds Incognito Mode for Meta AI Chats, Powered by Muse Spark Model

Google Gemini App Redesign Surfaces in iOS Limited Rollout, Heavily Leverages Liquid Glass