Meta Launches Llama 4 Scout and Maverick: First Open-Weight Multimodal MoE Models

Meta releases Llama 4 Scout (17B active, 16 experts, 10M context) and Maverick (17B active, 128 experts), its first natively multimodal mixture-of-experts models.

#Llama 4#Meta#MoE#Multimodal#Open Source

Meta Launches Llama 4 Scout and Maverick: First Open-Weight Multimodal MoE Models

AI Summary

Meta releases Llama 4 Scout (17B active, 16 experts, 10M context) and Maverick (17B active, 128 experts), its first natively multimodal mixture-of-experts models.

A New Architecture for the Llama Family

Meta has released Llama 4 Scout and Llama 4 Maverick on April 5, 2026, marking the most significant architectural shift in the Llama model family to date. For the first time, Llama models are built on a mixture-of-experts (MoE) architecture and trained as natively multimodal systems from the ground up. Both models can process text, images, and video inputs, and both are available as open-weight downloads on Hugging Face and llama.com.

The release follows months of anticipation after Meta announced it would invest $135 billion in AI infrastructure in 2026. Llama 4 represents the technical outcome of that investment: a model family designed to compete with proprietary offerings from Google, OpenAI, and Anthropic while remaining freely available to the developer community under open-weight terms.

Llama 4 Scout: A 10-Million-Token Context Window on a Single GPU

Llama 4 Scout is the smaller of the two models, with 17 billion active parameters spread across 16 experts and 109 billion total parameters. Its defining feature is an industry-leading context window of 10 million tokens, the longest available in any open-weight model. The base model was pre-trained with a 256K context length using an iRoPE architecture with interleaved attention layers, and the instruct-tuned version extends this to the full 10M tokens.

Despite its total parameter count, Scout is designed for efficient deployment. With Int4 quantization, the entire model fits on a single NVIDIA H100 GPU. This makes it accessible to research labs, startups, and individual developers who lack multi-node GPU clusters.

Meta reports that Scout outperforms Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 across a broad range of benchmarks, including coding, reasoning, long-context tasks, and image understanding. The 10M context window enables use cases that were previously exclusive to API-only services: full-codebase analysis, book-length document processing, and multi-session conversational memory.

Llama 4 Maverick: 128 Experts Competing With GPT-4o

Llama 4 Maverick scales up the MoE approach dramatically, maintaining the same 17 billion active parameters but distributing computation across 128 experts, with 400 billion total parameters. The model runs on a single H100 DGX host or can be distributed across multiple nodes for higher throughput.

Maverick's benchmark performance positions it as a direct competitor to GPT-4o and Gemini 2.0 Flash. Meta claims it beats both across a broad range of widely reported benchmarks while achieving comparable results to DeepSeek V3 on reasoning and coding tasks at less than half the active parameters. On LMArena, Maverick achieves an ELO rating of 1417.

The instruct-tuned Maverick model supports a 1 million token context window, ten times shorter than Scout but still among the longest available. Its strength lies in multimodal understanding: the model uses an early fusion architecture with an improved vision encoder based on MetaCLIP, enabling superior image grounding and visual reasoning.

Training at Scale: 30 Trillion Tokens Across 200 Languages

Both Scout and Maverick were pre-trained on over 30 trillion tokens, double the training data used for Llama 3. The dataset spans 200 languages, with over 100 languages represented by at least 1 billion tokens each, a tenfold increase in multilingual coverage compared to the previous generation.

Training was conducted at FP8 precision, achieving 390 TFLOPs per GPU. Meta's early fusion approach to multimodality, where text and visual information are processed jointly from the earliest layers rather than bolted on as separate modules, distinguishes Llama 4 from many competitors that add multimodal capabilities through adapters or late-fusion techniques.

The models were trained on diverse text, image, and video datasets, with Scout supporting up to 48 images during pre-training and tested successfully with 8 images in post-training scenarios.

Llama 4 Behemoth: The Teacher Model Still in Training

Meta also disclosed Llama 4 Behemoth, a massive model with 288 billion active parameters across 16 experts and approximately 2 trillion total parameters. Behemoth is still training at the time of the Scout and Maverick release, but Meta reports it already outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks.

Behemoth serves as the teacher model for knowledge distillation into Scout and Maverick. This training approach, where smaller models learn from a much larger teacher, explains why Scout and Maverick punch well above their active parameter count. Meta has not announced a release date for Behemoth, but the upcoming LlamaCon event on April 29 may provide more details.

Integration Across Meta's Ecosystem

Llama 4 is not just an API or download. Meta is deploying these models across its consumer products, powering Meta AI on WhatsApp, Messenger, Instagram Direct, and the meta.ai website. This integration gives Llama 4 immediate access to Meta's billions of monthly active users, creating a deployment scale that no other open-weight model can match.

For developers, the models are available through Hugging Face with open weights that can be fine-tuned for specific applications. The combination of open availability and massive-scale consumer deployment is a dual strategy: Meta benefits from community improvements while using its own products as the largest testing ground.

Competitive Position

Llama 4 enters a market that has become dramatically more competitive since Llama 3's release. Google launched Gemma 4 under Apache 2.0 just days earlier on April 2, and DeepSeek V3 continues to offer strong performance at extremely low cost. OpenAI's proprietary models remain the benchmark for many enterprise customers.

Llama 4's advantages are architectural. The MoE design provides strong performance with relatively few active parameters, making inference cheaper. The 10M context window on Scout has no equal among open-weight models. And the natively multimodal training means these capabilities are not compromises or afterthoughts but core design features.

The main limitation is that Llama 4 uses a community license rather than Apache 2.0, which imposes some restrictions on commercial use and redistribution that Gemma 4 does not have. For organizations that prioritize licensing flexibility, this remains a consideration.

Conclusion

Meta's Llama 4 Scout and Maverick represent a generational leap for open-weight AI models. The combination of MoE architecture, native multimodality, 10M-token context windows, and 200-language support creates a model family that competes directly with the best proprietary offerings while remaining freely downloadable. For developers, researchers, and organizations building on open models, Llama 4 sets a new standard for what is available outside of API-only services.

Editor's Verdict

Meta Launches Llama 4 Scout and Maverick: First Open-Weight Multimodal MoE Models earns a solid recommendation within the llama space.

The strongest case for paying attention is industry-leading 10M token context window on Scout enables previously impossible long-context applications, which raises the bar for what readers should now expect from peers in this space. Reinforcing that, open-weight availability allows full fine-tuning, inspection, and deployment without API dependencies adds practical value rather than just headline appeal. The broader signal worth registering is straightforward: the 10M token context window on Scout is the longest available in any open-weight model, enabling full-codebase analysis and book-length document processing. On the other side of the ledger, community license is more restrictive than Apache 2.0, limiting some commercial and redistribution use cases is a real constraint, not a marketing footnote, and it should factor into any serious decision. Layered on top of that, maverick's 400B total parameters require a full H100 DGX host, putting it out of reach for smaller teams narrows the set of teams for whom this is an obvious yes.

For on-premises AI teams, open-weight enthusiasts, and organizations needing full model control, this is a serious evaluation candidate, not just a curiosity to bookmark. For everyone else, the safer posture is to monitor coverage and revisit once the use cases that matter to your team are demonstrated in the wild.

Pros

Industry-leading 10M token context window on Scout enables previously impossible long-context applications
Open-weight availability allows full fine-tuning, inspection, and deployment without API dependencies
MoE architecture delivers strong performance with efficient resource utilization on standard GPU hardware
Native multimodal training produces more coherent text-image understanding than bolt-on approaches
Massive multilingual coverage (200 languages) serves global developer communities

Cons

Community license is more restrictive than Apache 2.0, limiting some commercial and redistribution use cases
Maverick's 400B total parameters require a full H100 DGX host, putting it out of reach for smaller teams
Behemoth is not yet released, meaning the full Llama 4 family is incomplete at launch
Initial community reports suggest benchmark results may not fully reflect real-world conversational quality

References

The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation - Meta AI Blog Meta releases Llama 4, a new crop of flagship AI models - TechCrunch Welcome Llama 4 Maverick and Scout on Hugging Face Unmatched Performance and Efficiency - Llama 4 Official

Comments0

Key Features

1. Scout: 17B active parameters with 16 experts (109B total), industry-leading 10M token context window, fits on single H100 GPU with Int4 quantization 2. Maverick: 17B active parameters with 128 experts (400B total), beats GPT-4o and Gemini 2.0 Flash on benchmarks, 1M token context window 3. Natively multimodal MoE architecture with early fusion for text, image, and video processing from the earliest layers 4. Pre-trained on 30+ trillion tokens across 200 languages (10x multilingual coverage over Llama 3) 5. Behemoth teacher model (288B active, ~2T total) outperforms GPT-4.5 and Claude Sonnet 3.7 on STEM benchmarks

Key Insights

The 10M token context window on Scout is the longest available in any open-weight model, enabling full-codebase analysis and book-length document processing
MoE architecture with 17B active parameters achieves performance comparable to models with 2-3x the active compute budget
Native multimodality through early fusion gives Llama 4 structural advantages over competitors using adapter-based approaches
Training on 30+ trillion tokens across 200 languages makes Llama 4 the most linguistically diverse open model family
Deployment across WhatsApp, Messenger, and Instagram gives Llama 4 immediate access to billions of users for real-world testing
The Behemoth teacher model with 2 trillion total parameters demonstrates Meta's investment in knowledge distillation at unprecedented scale
Maverick achieving GPT-4o-level performance at less than half the active parameters signals a shift toward efficiency-first model design