LAUNCHES

Google Releases Gemma 4 as Fully Open-Weight Model With On-Device Smartphone Support

R Ryan Matsuda Apr 11, 2026 3 min read
Engine Score 8/10 — Important

Google Gemma 4 goes fully open-source unlocking on-device AI on phones — major open-weight release

Editorial illustration for: Google Releases Gemma 4 as Fully Open-Weight Model With On-Device Smartphone Support
  • Google has released Gemma 4 as a fully open-weight model under a permissive license, making it freely usable for commercial and research applications.
  • The model family includes at least one variant optimized for on-device inference on smartphones, requiring no cloud connectivity.
  • The release extends Google’s Gemma open-model ecosystem, which began in February 2024 and has expanded through four generations.
  • Gemma 4 is available via Hugging Face and Google AI Studio, consistent with prior Gemma releases.

What Happened

Google released Gemma 4 as a fully open-weight model on or around April 2026, making the model weights publicly available and enabling local inference on devices ranging from cloud servers down to smartphones, according to reporting by ZDNET. The release is the fourth generation of Google’s Gemma family, a line of open-weight models built on the same research infrastructure underpinning Google’s proprietary Gemini models. The on-device capability for phones marks a significant step in making capable generative AI available without a network connection.

Why It Matters

Google first introduced the Gemma family in February 2024 with 2B and 7B parameter models under an Apache 2.0-style open license, positioning them against Meta’s Llama series. Gemma 2 followed in mid-2024 with improved efficiency at 2B, 9B, and 27B scales, and Gemma 3 arrived in March 2025 with a 1B parameter model specifically engineered for mobile and on-device use alongside larger 4B, 12B, and 27B variants. Each release has incrementally lowered the hardware floor required to run capable instruction-tuned models, a trend that Gemma 4 continues with explicit phone support.

On-device AI matters for several reasons beyond novelty: it eliminates latency from round-trip API calls, removes per-token API costs for developers, and keeps user data local — a meaningful differentiator for privacy-sensitive applications. The competitive landscape includes Meta’s Llama 4 family, Microsoft’s Phi-4, and Mistral’s open-weight releases, all targeting similar deployment profiles.

Technical Details

The Gemma model family has consistently used quantization and architecture optimizations to enable inference on constrained hardware. Gemma 3’s 1B model, for instance, was designed to run within the memory envelope of mid-range Android devices using Google’s LLM Inference API and MediaPipe framework, which provides a standardized on-device deployment path. Google’s Gemma technical reports — published under the collective authorship of the Gemma Team at Google — have documented that the on-device variants use INT4 and INT8 weight quantization to reduce memory footprint without significant accuracy degradation on standard benchmarks. According to ZDNET’s coverage of the Gemma 4 release, the new generation extends these on-device capabilities further, though Google has not yet published a full technical report with independently verifiable benchmark comparisons as of the time of this writing.

The open-weight release means the model weights are publicly available for download and local fine-tuning, which distinguishes Gemma from API-only offerings. Google said in its Gemma 4 announcement that the release is intended to give developers “the ability to build powerful AI-powered applications that work anywhere, including fully offline,” per the ZDNET report.

Who’s Affected

Android and iOS developers building applications that require natural language understanding — including summarization, chat interfaces, and on-device search — gain a production-ready option that avoids API costs and data transmission. Enterprise teams in regulated industries such as healthcare and finance, where data residency constraints limit cloud AI adoption, are a direct beneficiary of open-weight on-device models. Edge AI hardware vendors, including chipmakers building inference accelerators for mobile, also stand to benefit as capable open models drive demand for optimized inference silicon.

What’s Next

Google is expected to update its LLM Inference API documentation and MediaPipe tooling to support Gemma 4, following the same integration pattern used for Gemma 3. Developers can access the model weights through Google’s Hugging Face organization and through Google AI Studio, where no-code experimentation is possible before local deployment. A full technical report from the Gemma Team at Google, detailing training methodology, safety evaluations, and benchmark results, is anticipated to follow the initial release.

Share

Enjoyed this story?

Get articles like this delivered daily. The Engine Room — free AI intelligence newsletter.

Join 500+ AI professionals · No spam · Unsubscribe anytime