Google Launches Gemma 4: Four Open Models With Agentic Skills Under Apache 2.0
Google DeepMind releases Gemma 4, a family of four open-weight models from 2B to 31B parameters, under Apache 2.0, designed for advanced reasoning and edge deployment.
Google DeepMind releases Gemma 4, a family of four open-weight models from 2B to 31B parameters, under Apache 2.0, designed for advanced reasoning and edge deployment.
The Most Capable Open Models Google Has Ever Released
Google DeepMind has launched Gemma 4, a family of four open-weight models that the company calls its most intelligent open models to date. Released on April 2, 2026, under the permissive Apache 2.0 license, Gemma 4 is purpose-built for advanced reasoning and agentic workflows, delivering what Google describes as an unprecedented level of intelligence-per-parameter.
The release comes at a critical moment in the open-source AI landscape. Chinese labs like DeepSeek and Qwen have been gaining ground with competitive open models, and Meta's Llama series remains the most widely deployed open-weight family. By releasing Gemma 4 under Apache 2.0, a significant upgrade from the restrictive Gemma Use Policy that governed earlier versions, Google is making its strongest bid yet to win over the developer community.
Four Models for Every Scale
Gemma 4 arrives in four distinct sizes, organized into two tiers designed for different deployment scenarios.
The edge tier includes two compact models. Gemma 4 E2B has 2.3 billion effective parameters and is designed for smartphones, embedded devices, and IoT hardware. Gemma 4 E4B has 4.5 billion effective parameters and targets laptops and mid-range devices. Both edge models support text, image, and native audio input with 128K-token context windows, making them capable of speech recognition and visual understanding directly on-device.
The workstation tier includes the larger models. Gemma 4 26B is a Mixture-of-Experts (MoE) architecture with 26 billion total parameters but only 3.8 billion active parameters at any given time, providing strong performance with efficient resource use. Gemma 4 31B is a dense model with 31 billion parameters, the most capable in the family. Both workstation models support text and image input with 256K-token context windows.
Benchmark Performance That Challenges Closed Models
Gemma 4's benchmark results are remarkable for open-weight models. The 31B dense model scores 85.2% on MMLU Pro, 89.2% on AIME 2026, and 80.0% on LiveCodeBench v6, with a Codeforces ELO of 2,150. These numbers place it in direct competition with much larger closed-source models.
The 26B MoE model is equally impressive relative to its active parameter count. It ranks 6th on Arena AI while using only 3.8 billion active parameters, a fraction of what competing models require. This efficiency makes it particularly attractive for organizations that need strong performance without massive GPU clusters.
Across both tiers, Gemma 4 demonstrates significant improvements in mathematical reasoning, instruction-following, and multi-step planning compared to its predecessors.
Built for Agentic Workflows
Gemma 4's most forward-looking feature is its native support for agentic tasks. The models can perform multi-step planning, autonomous action execution, and tool use without specialized fine-tuning. This means developers can build AI agents that reason through complex tasks, execute code, search the web, and interact with APIs using Gemma 4 as the base model.
The edge models add another dimension with native audio input processing. A smartphone running Gemma 4 E2B can listen to spoken commands, analyze images from the camera, and execute multi-step actions entirely offline with near-zero latency. This opens up use cases in industrial automation, field operations, and accessibility tools where cloud connectivity is unreliable.
All models are natively trained on over 140 languages, making Gemma 4 one of the most linguistically diverse open model families available.
Ecosystem and Framework Support
Gemma 4 launches with day-one support across a comprehensive list of frameworks and platforms. Developers can use the models through Hugging Face (Transformers, TRL, Transformers.js, Candle), LiteRT-LM, vLLM, llama.cpp, MLX, Ollama, NVIDIA NIM and NeMo, LM Studio, Unsloth, SGLang, Cactus, Baseten, Docker, MaxText, Tunix, and Keras.
This broad compatibility is intentional. Since the first Gemma generation, developers have downloaded Gemma models over 400 million times and created more than 100,000 variants in what Google calls the Gemmaverse. The Apache 2.0 license removes previous restrictions on commercial use, derivative works, and redistribution, which should further accelerate community adoption.
Google has also launched the Kaggle Gemma 4 Good Hackathon, inviting developers to apply the models to challenges in health and sciences, global resilience, education, and digital equity.
Competitive Landscape
Gemma 4 enters a crowded field. Meta's Llama 4 Scout, released earlier in 2026, introduced MoE architecture to the Llama family. DeepSeek V3 and Qwen 3.5 have established strong positions among developers building with open models. Mistral's models remain popular for European deployments.
Gemma 4's advantages are clear. The Apache 2.0 license is more permissive than Meta's community license, the edge models fill a niche that few competitors address with multimodal capabilities, and the MoE architecture provides excellent efficiency. The benchmark performance at the 31B scale is competitive with models several times its size.
The main question is whether Google can convert benchmark performance into developer adoption. Llama's ecosystem benefits and DeepSeek's cost advantages are difficult to displace, even with superior technical specifications.
Conclusion
Gemma 4 represents Google's most serious open-source AI offering to date. The combination of four model sizes, Apache 2.0 licensing, multimodal edge capabilities, and strong benchmark performance makes it a compelling choice for developers building everything from smartphone applications to enterprise agentic systems. Whether it can challenge Llama's ecosystem dominance remains to be seen, but Gemma 4 has closed the gap significantly. The models are available now on Hugging Face, Kaggle, and Google AI Studio.
Pros
- Apache 2.0 license provides maximum commercial freedom with no usage restrictions
- Four model sizes cover every deployment scenario from smartphones to GPU workstations
- MoE architecture delivers near-frontier performance with a fraction of the compute cost
- Native multimodal capabilities (text, image, audio) on edge models without cloud dependency
- Broad framework support enables immediate integration with existing developer toolchains
Cons
- Largest model is 31B parameters, which may lag behind 70B+ models on the most demanding reasoning tasks
- Edge models lack text generation output for audio, limiting speech-to-speech applications
- MoE architecture requires specialized inference infrastructure that not all deployment environments support
- Competing with Llama's massive ecosystem and community momentum remains an uphill challenge
References
Comments0
Key Features
1. Four model sizes (E2B 2.3B, E4B 4.5B, 26B MoE, 31B dense) spanning edge devices to workstations with 128K-256K context windows 2. Apache 2.0 license replaces the restrictive Gemma Use Policy, enabling unrestricted commercial use and redistribution 3. MoE architecture in the 26B model uses only 3.8B active parameters while ranking 6th on Arena AI 4. Native multimodal capabilities including text, image, and audio input on edge models for offline deployment 5. Agentic workflow support with multi-step planning, tool use, and autonomous action without fine-tuning
Key Insights
- The shift to Apache 2.0 removes the biggest adoption barrier that limited earlier Gemma versions in commercial settings
- The 26B MoE model achieving Arena AI rank 6 with only 3.8B active parameters demonstrates exceptional efficiency-per-parameter
- Native audio input on edge models enables entirely new offline AI applications in industrial, medical, and accessibility domains
- 400 million cumulative Gemma downloads and 100,000 community variants indicate a mature developer ecosystem ready for the upgrade
- Day-one support across 17 frameworks including llama.cpp and Ollama ensures immediate compatibility with existing local AI workflows
- The Codeforces ELO of 2,150 on the 31B model puts its coding ability in range of models with 10x the parameter count
- Multi-language training on 140+ languages positions Gemma 4 as the most linguistically diverse open model family available
Was this review helpful?
Share
Related AI Reviews
Model Context Protocol Hits 97 Million Monthly Downloads: How Anthropic's Open Standard Won the AI Integration Layer
MCP reached 97 million monthly SDK downloads in March 2026, up from 2 million at launch 16 months ago, becoming the universal standard for AI agent integration.
Galileo Launches Agent Control: Open-Source Governance for Enterprise AI Agents
Galileo releases Agent Control under Apache 2.0, an open-source control plane that lets enterprises write AI agent policies once and enforce them across CrewAI, Glean, and Cisco integrations.
AI2 Releases Olmo Hybrid: 2x Data Efficiency by Merging Transformers with Linear RNNs
AI2's Olmo Hybrid 7B combines transformer attention with Gated DeltaNet linear recurrence, matching Olmo 3 accuracy on MMLU using 49% fewer tokens.
Steerling-8B: The First LLM That Can Explain Every Word It Generates
Guide Labs releases Steerling-8B, an 8B-parameter open-source LLM where every generated token traces back to its training data, input context, and human-understandable concepts.
