Back to list
Feb 17, 2026
124
0
0
Other LLM

Qwen 3.5 Launches: Alibaba's 397B MoE Model Targets the Agentic AI Era

Alibaba releases Qwen3.5-397B-A17B, a sparse MoE model activating only 17B parameters per token, claiming to outperform GPT-5.2 and Claude Opus 4.5 on 80% of benchmarks at 60% lower cost.

#Qwen 3.5#Alibaba#MoE#Mixture of Experts#Agentic AI
Qwen 3.5 Launches: Alibaba's 397B MoE Model Targets the Agentic AI Era
AI Summary

Alibaba releases Qwen3.5-397B-A17B, a sparse MoE model activating only 17B parameters per token, claiming to outperform GPT-5.2 and Claude Opus 4.5 on 80% of benchmarks at 60% lower cost.

Alibaba Enters the Agentic AI Race with Qwen 3.5

On February 16, 2026, Alibaba's Qwen team officially released Qwen3.5-397B-A17B, the first model in the Qwen 3.5 series and the company's most ambitious open-weight AI model to date. Packing 397 billion total parameters into a sparse Mixture-of-Experts (MoE) architecture that activates only 17 billion parameters per forward pass, Qwen 3.5 is designed explicitly for the agentic AI era, where models must operate autonomously across complex multi-step workflows.

The release comes at a strategically charged moment. Nearly every major Chinese AI developer unveiled new flagship models in the same week, and the global competition between Chinese and American AI labs has never been more intense. Qwen 3.5 is Alibaba's answer to that pressure, and the benchmark numbers suggest it deserves serious attention.

Architecture: A New Approach to Efficiency

The most technically distinctive feature of Qwen 3.5 is its hybrid attention architecture combined with a massive expert pool. The MoE layer employs 512 experts, activating 10 routed experts plus 1 shared expert per token. This is a significantly larger expert pool than typical MoE designs, but with small individual expert sizes (intermediate dimension of 1,024), which optimizes cache efficiency during inference.

What sets Qwen 3.5 apart architecturally is its attention mechanism. Three out of four sublayers use GatedDeltaNet (GDN), a state-based recurrence architecture that delivers near-linear scaling with sequence length. Only one sublayer uses full GatedAttention. This hybrid design is the key to Qwen 3.5's ability to handle extremely long contexts efficiently.

The model supports a 1 million token context window through Qwen3.5-Plus, the hosted inference variant on Alibaba Cloud Model Studio. The open-weight version ships with Apache 2.0 licensing and can be self-hosted on 8xH100 GPUs, achieving approximately 45 tokens per second throughput.

The vocabulary has been expanded to 250,000 tokens covering 201 languages, compared to the previous generation's 152,000 tokens and 119 languages. Training was conducted natively in FP8 precision, a departure from the BF16/FP16 pipelines used by Qwen3-Max.

Benchmark Performance: Strong but Nuanced

Alibaba claims Qwen 3.5 outperforms GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro on 80% of evaluated benchmark categories. The headline numbers are impressive, but a closer look reveals a more nuanced picture.

On reasoning tasks, Qwen3.5-397B-A17B scores 88.4 on GPQA Diamond (graduate-level reasoning) and 87.8 on MMLU-Pro. These are competitive with frontier models. On AIME26, a math olympiad benchmark, it reaches 91.3, which is strong but still trails GPT-5.2 at 96.7 and Claude Opus 4.5 at 93.3.

Coding performance is similarly strong. LiveCodeBench v6 shows a score of 83.6, and SWE-bench Verified comes in at 76.4. Again, GPT-5.2 leads on LiveCodeBench with 87.7, but the gap is not insurmountable, especially considering Qwen 3.5's significant cost advantage.

Where Qwen 3.5 genuinely stands out is in agentic and instruction-following tasks. It posts the best scores in the field on IFBench (76.5) and MultiChallenge (67.6). On TAU2, which measures autonomous agent performance, Qwen 3.5 scores 86.7. BrowseComp, testing web browsing agent capability, comes in at 78.6. These results suggest the model was specifically optimized for the autonomous task execution workflows that Alibaba is betting on.

Multimodal performance is also notable. As a native vision-language model trained with early fusion, Qwen 3.5 achieves 85.0 on MMMU, 79.0 on MMMU-Pro, 90.8 on OmniDocBench v1.5, and 87.5 on Video-MME. The native multimodal integration means visual understanding is not bolted on as an afterthought but woven into the model's core architecture.

Cost and Efficiency: The Real Competitive Edge

Benchmark scores tell only part of the story. The cost and efficiency improvements are arguably Qwen 3.5's most compelling selling point.

The model delivers 8.6x to 19x faster decoding than Qwen3-Max, depending on context length. At 32K context, the speedup is 8.6x. At 256K context, it reaches 19x. The architecture achieves a 95% reduction in activation memory compared to dense equivalents, which is what makes the 1 million token context window practical.

Pricing for Qwen3.5-Plus on Alibaba Cloud Model Studio is approximately $0.18 per 1 million tokens, roughly 60% cheaper than the previous generation. The expanded 250K vocabulary also contributes to cost savings: non-English text requires 10-60% fewer tokens, making the model particularly cost-effective for multilingual applications.

For organizations that prefer self-hosting, the open-weight model under Apache 2.0 license eliminates API costs entirely. Running on 8xH100 GPUs, the model achieves production-grade throughput, making it accessible to well-resourced enterprises and research labs.

Visual Agentic Capabilities

Qwen 3.5 introduces visual agentic capabilities that allow it to operate independently across mobile and desktop applications. This means the model can interpret screen content, navigate user interfaces, and execute multi-step tasks that involve visual understanding.

This capability positions Qwen 3.5 as more than just a language model or even a multimodal model. It is designed to be a foundation for AI agents that interact with software the way humans do, by seeing and clicking rather than relying solely on API integrations. While computer-use agents from Anthropic and others have explored this space, Qwen 3.5's native multimodal training gives it a potentially more seamless approach to visual interaction.

Competitive Landscape

Qwen 3.5 enters a crowded field. GPT-5.2 from OpenAI still leads on several key benchmarks, particularly in math and competitive coding. Claude Opus 4.5 maintains its strengths in nuanced reasoning and safety. Gemini 3 Pro offers deep integration with Google's ecosystem.

Among Chinese competitors, ByteDance's Doubao 2.0 and the anticipated next DeepSeek release add further pressure. The fact that nearly every major Chinese AI developer released new models in the same week underscores the intensity of the domestic competition.

However, Qwen 3.5's combination of open weights, Apache 2.0 licensing, strong agentic performance, and dramatically lower costs creates a distinct value proposition. For developers and enterprises that prioritize cost efficiency, self-hosting flexibility, and agentic capabilities, Qwen 3.5 is now the most compelling option in the open-weight space.

Pros and Cons at a Glance

Qwen 3.5's strengths are clear: frontier-level performance at a fraction of the cost, a genuinely innovative hybrid attention architecture, native multimodal capabilities, and the most generous licensing terms among models of this caliber. The 1 million token context window and visual agentic features position it well for the next wave of AI applications.

The limitations are equally real. Despite the 80% claim, it trails GPT-5.2 and Claude Opus 4.5 on several high-profile benchmarks. The 397B parameter count means self-hosting still requires substantial infrastructure. And while the model is open-weight, the ecosystem of tools and integrations around it is less mature than what OpenAI or Anthropic offer.

Conclusion

Qwen3.5-397B-A17B represents a significant leap forward for Alibaba's AI ambitions and for the open-weight model ecosystem broadly. Its hybrid MoE architecture with GatedDeltaNet attention is a genuine technical innovation, and the benchmark results demonstrate that open-weight models can compete at the frontier. For developers building agentic AI applications, enterprises seeking cost-effective alternatives to proprietary APIs, and researchers interested in novel architectures, Qwen 3.5 is one of the most important model releases of early 2026.

Pros

  • Frontier-level benchmark performance at 60% lower cost than predecessors, with $0.18 per 1M token pricing
  • Innovative hybrid MoE architecture with GatedDeltaNet delivers near-linear scaling and 1M token context window
  • Apache 2.0 open-weight licensing enables full self-hosting and commercial use without restrictions
  • Best-in-class agentic and instruction-following performance on IFBench, TAU2, and BrowseComp benchmarks
  • Native multimodal capabilities with strong vision-language performance across MMMU, OmniDocBench, and Video-MME

Cons

  • Trails GPT-5.2 and Claude Opus 4.5 on high-profile math and coding benchmarks like AIME26 and LiveCodeBench
  • Self-hosting the 397B parameter model requires 8xH100 GPUs, which limits accessibility for smaller organizations
  • The tooling and integration ecosystem is less mature than OpenAI or Anthropic offerings
  • Independent third-party verification of Alibaba's benchmark claims remains limited at launch

Comments0

Key Features

Qwen3.5-397B-A17B is Alibaba's flagship open-weight model featuring a sparse Mixture-of-Experts architecture with 397 billion total parameters and only 17 billion active per token. It employs 512 experts with a hybrid GatedDeltaNet attention mechanism for near-linear sequence scaling. The model supports 1 million token context, 201 languages, native vision-language capabilities, and visual agentic features. It delivers 8.6x-19x faster decoding than predecessors at 60% lower cost, with Apache 2.0 licensing.

Key Insights

  • Qwen 3.5 uses 512 MoE experts with a hybrid GatedDeltaNet attention mechanism, activating only 17B of 397B parameters per token
  • Alibaba claims the model outperforms GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro on 80% of evaluated benchmark categories
  • IFBench score of 76.5 and TAU2 score of 86.7 are best-in-class for instruction following and autonomous agent tasks
  • Decoding speed is 8.6x to 19x faster than Qwen3-Max depending on context length, with 95% activation memory reduction
  • Pricing at approximately $0.18 per 1M tokens represents a 60% cost reduction from the previous generation
  • The expanded 250K token vocabulary covering 201 languages reduces non-English token usage by 10-60%
  • Native vision-language training with early fusion enables visual agentic capabilities across desktop and mobile applications
  • Apache 2.0 licensing and self-hosting on 8xH100 GPUs make it the most accessible frontier-class open-weight model

Was this review helpful?

Share

Twitter/X