Feb 17, 2026

Other LLM

Qwen 3.5 Launches: Alibaba's 397B MoE Model Targets the Agentic AI Era

Alibaba releases Qwen3.5-397B-A17B, a sparse MoE model activating only 17B parameters per token, claiming to outperform GPT-5.2 and Claude Opus 4.5 on 80% of benchmarks at 60% lower cost.

#Qwen 3.5#Alibaba#MoE#Mixture of Experts#Agentic AI

Qwen 3.5 Launches: Alibaba's 397B MoE Model Targets the Agentic AI Era

AI Summary

Alibaba releases Qwen3.5-397B-A17B, a sparse MoE model activating only 17B parameters per token, claiming to outperform GPT-5.2 and Claude Opus 4.5 on 80% of benchmarks at 60% lower cost.

Alibaba Enters the Agentic AI Race with Qwen 3.5

On February 16, 2026, Alibaba's Qwen team officially released Qwen3.5-397B-A17B, the first model in the Qwen 3.5 series and the company's most ambitious open-weight AI model to date. Packing 397 billion total parameters into a sparse Mixture-of-Experts (MoE) architecture that activates only 17 billion parameters per forward pass, Qwen 3.5 is designed explicitly for the agentic AI era, where models must operate autonomously across complex multi-step workflows.

The release comes at a strategically charged moment. Nearly every major Chinese AI developer unveiled new flagship models in the same week, and the global competition between Chinese and American AI labs has never been more intense. Qwen 3.5 is Alibaba's answer to that pressure, and the benchmark numbers suggest it deserves serious attention.

Architecture: A New Approach to Efficiency

The most technically distinctive feature of Qwen 3.5 is its hybrid attention architecture combined with a massive expert pool. The MoE layer employs 512 experts, activating 10 routed experts plus 1 shared expert per token. This is a significantly larger expert pool than typical MoE designs, but with small individual expert sizes (intermediate dimension of 1,024), which optimizes cache efficiency during inference.

What sets Qwen 3.5 apart architecturally is its attention mechanism. Three out of four sublayers use GatedDeltaNet (GDN), a state-based recurrence architecture that delivers near-linear scaling with sequence length. Only one sublayer uses full GatedAttention. This hybrid design is the key to Qwen 3.5's ability to handle extremely long contexts efficiently.

The model supports a 1 million token context window through Qwen3.5-Plus, the hosted inference variant on Alibaba Cloud Model Studio. The open-weight version ships with Apache 2.0 licensing and can be self-hosted on 8xH100 GPUs, achieving approximately 45 tokens per second throughput.

The vocabulary has been expanded to 250,000 tokens covering 201 languages, compared to the previous generation's 152,000 tokens and 119 languages. Training was conducted natively in FP8 precision, a departure from the BF16/FP16 pipelines used by Qwen3-Max.

Benchmark Performance: Strong but Nuanced

Alibaba claims Qwen 3.5 outperforms GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro on 80% of evaluated benchmark categories. The headline numbers are impressive, but a closer look reveals a more nuanced picture.

On reasoning tasks, Qwen3.5-397B-A17B scores 88.4 on GPQA Diamond (graduate-level reasoning) and 87.8 on MMLU-Pro. These are competitive with frontier models. On AIME26, a math olympiad benchmark, it reaches 91.3, which is strong but still trails GPT-5.2 at 96.7 and Claude Opus 4.5 at 93.3.

Coding performance is similarly strong. LiveCodeBench v6 shows a score of 83.6, and SWE-bench Verified comes in at 76.4. Again, GPT-5.2 leads on LiveCodeBench with 87.7, but the gap is not insurmountable, especially considering Qwen 3.5's significant cost advantage.

Where Qwen 3.5 genuinely stands out is in agentic and instruction-following tasks. It posts the best scores in the field on IFBench (76.5) and MultiChallenge (67.6). On TAU2, which measures autonomous agent performance, Qwen 3.5 scores 86.7. BrowseComp, testing web browsing agent capability, comes in at 78.6. These results suggest the model was specifically optimized for the autonomous task execution workflows that Alibaba is betting on.

Multimodal performance is also notable. As a native vision-language model trained with early fusion, Qwen 3.5 achieves 85.0 on MMMU, 79.0 on MMMU-Pro, 90.8 on OmniDocBench v1.5, and 87.5 on Video-MME. The native multimodal integration means visual understanding is not bolted on as an afterthought but woven into the model's core architecture.

Cost and Efficiency: The Real Competitive Edge

Benchmark scores tell only part of the story. The cost and efficiency improvements are arguably Qwen 3.5's most compelling selling point.

The model delivers 8.6x to 19x faster decoding than Qwen3-Max, depending on context length. At 32K context, the speedup is 8.6x. At 256K context, it reaches 19x. The architecture achieves a 95% reduction in activation memory compared to dense equivalents, which is what makes the 1 million token context window practical.

Pricing for Qwen3.5-Plus on Alibaba Cloud Model Studio is approximately $0.18 per 1 million tokens, roughly 60% cheaper than the previous generation. The expanded 250K vocabulary also contributes to cost savings: non-English text requires 10-60% fewer tokens, making the model particularly cost-effective for multilingual applications.

For organizations that prefer self-hosting, the open-weight model under Apache 2.0 license eliminates API costs entirely. Running on 8xH100 GPUs, the model achieves production-grade throughput, making it accessible to well-resourced enterprises and research labs.

Visual Agentic Capabilities

Qwen 3.5 introduces visual agentic capabilities that allow it to operate independently across mobile and desktop applications. This means the model can interpret screen content, navigate user interfaces, and execute multi-step tasks that involve visual understanding.

This capability positions Qwen 3.5 as more than just a language model or even a multimodal model. It is designed to be a foundation for AI agents that interact with software the way humans do, by seeing and clicking rather than relying solely on API integrations. While computer-use agents from Anthropic and others have explored this space, Qwen 3.5's native multimodal training gives it a potentially more seamless approach to visual interaction.

Competitive Landscape

Qwen 3.5 enters a crowded field. GPT-5.2 from OpenAI still leads on several key benchmarks, particularly in math and competitive coding. Claude Opus 4.5 maintains its strengths in nuanced reasoning and safety. Gemini 3 Pro offers deep integration with Google's ecosystem.

Among Chinese competitors, ByteDance's Doubao 2.0 and the anticipated next DeepSeek release add further pressure. The fact that nearly every major Chinese AI developer released new models in the same week underscores the intensity of the domestic competition.

However, Qwen 3.5's combination of open weights, Apache 2.0 licensing, strong agentic performance, and dramatically lower costs creates a distinct value proposition. For developers and enterprises that prioritize cost efficiency, self-hosting flexibility, and agentic capabilities, Qwen 3.5 is now the most compelling option in the open-weight space.

Pros and Cons at a Glance

Qwen 3.5's strengths are clear: frontier-level performance at a fraction of the cost, a genuinely innovative hybrid attention architecture, native multimodal capabilities, and the most generous licensing terms among models of this caliber. The 1 million token context window and visual agentic features position it well for the next wave of AI applications.

The limitations are equally real. Despite the 80% claim, it trails GPT-5.2 and Claude Opus 4.5 on several high-profile benchmarks. The 397B parameter count means self-hosting still requires substantial infrastructure. And while the model is open-weight, the ecosystem of tools and integrations around it is less mature than what OpenAI or Anthropic offer.

Conclusion

Qwen3.5-397B-A17B represents a significant leap forward for Alibaba's AI ambitions and for the open-weight model ecosystem broadly. Its hybrid MoE architecture with GatedDeltaNet attention is a genuine technical innovation, and the benchmark results demonstrate that open-weight models can compete at the frontier. For developers building agentic AI applications, enterprises seeking cost-effective alternatives to proprietary APIs, and researchers interested in novel architectures, Qwen 3.5 is one of the most important model releases of early 2026.

Editor's Verdict

Qwen 3.5 Launches: Alibaba's 397B MoE Model Targets the Agentic AI Era earns a solid recommendation within the other llm space.

The strongest case for paying attention is frontier-level benchmark performance at 60% lower cost than predecessors, with $0.18 per 1M token pricing, which raises the bar for what readers should now expect from peers in this space. Reinforcing that, innovative hybrid MoE architecture with GatedDeltaNet delivers near-linear scaling and 1M token context window adds practical value rather than just headline appeal. The broader signal worth registering is straightforward: qwen 3.5 uses 512 MoE experts with a hybrid GatedDeltaNet attention mechanism, activating only 17B of 397B parameters per token. On the other side of the ledger, trails GPT-5.2 and Claude Opus 4.5 on high-profile math and coding benchmarks like AIME26 and LiveCodeBench is a real constraint, not a marketing footnote, and it should factor into any serious decision. Layered on top of that, self-hosting the 397B parameter model requires 8xH100 GPUs, which limits accessibility for smaller organizations narrows the set of teams for whom this is an obvious yes.

For multi-model deployment teams, cost-conscious operators, and developers willing to evaluate beyond the major labs, this is a serious evaluation candidate, not just a curiosity to bookmark. For everyone else, the safer posture is to monitor coverage and revisit once the use cases that matter to your team are demonstrated in the wild.

Pros

Frontier-level benchmark performance at 60% lower cost than predecessors, with $0.18 per 1M token pricing
Innovative hybrid MoE architecture with GatedDeltaNet delivers near-linear scaling and 1M token context window
Apache 2.0 open-weight licensing enables full self-hosting and commercial use without restrictions
Best-in-class agentic and instruction-following performance on IFBench, TAU2, and BrowseComp benchmarks
Native multimodal capabilities with strong vision-language performance across MMMU, OmniDocBench, and Video-MME

Cons

Trails GPT-5.2 and Claude Opus 4.5 on high-profile math and coding benchmarks like AIME26 and LiveCodeBench
Self-hosting the 397B parameter model requires 8xH100 GPUs, which limits accessibility for smaller organizations
The tooling and integration ecosystem is less mature than OpenAI or Anthropic offerings
Independent third-party verification of Alibaba's benchmark claims remains limited at launch

References

Alibaba Qwen Team Releases Qwen3.5-397B MoE Model with 17B Active Parameters and 1M Token Context for AI Agents - MarkTechPost Qwen 3.5: 397B MoE Benchmarks, Pricing & Complete Guide - Digital Applied Alibaba Unveils Qwen 3.5 AI Model with Agentic Capabilities - WinBuzzer Qwen3.5-397B-A17B - Hugging Face Alibaba unveils Qwen-3.5, sharpening global race to spread AI models - South China Morning Post

Comments0

Key Features

Qwen3.5-397B-A17B is Alibaba's flagship open-weight model featuring a sparse Mixture-of-Experts architecture with 397 billion total parameters and only 17 billion active per token. It employs 512 experts with a hybrid GatedDeltaNet attention mechanism for near-linear sequence scaling. The model supports 1 million token context, 201 languages, native vision-language capabilities, and visual agentic features. It delivers 8.6x-19x faster decoding than predecessors at 60% lower cost, with Apache 2.0 licensing.

Key Insights

Qwen 3.5 uses 512 MoE experts with a hybrid GatedDeltaNet attention mechanism, activating only 17B of 397B parameters per token
Alibaba claims the model outperforms GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro on 80% of evaluated benchmark categories
IFBench score of 76.5 and TAU2 score of 86.7 are best-in-class for instruction following and autonomous agent tasks
Decoding speed is 8.6x to 19x faster than Qwen3-Max depending on context length, with 95% activation memory reduction
Pricing at approximately $0.18 per 1M tokens represents a 60% cost reduction from the previous generation
The expanded 250K token vocabulary covering 201 languages reduces non-English token usage by 10-60%
Native vision-language training with early fusion enables visual agentic capabilities across desktop and mobile applications
Apache 2.0 licensing and self-hosting on 8xH100 GPUs make it the most accessible frontier-class open-weight model

Was this review helpful?

Twitter/X

Related AI Reviews

Grok 4.5 Launch: xAI and Cursor's First Joint Model Targets Legal, Finance

NEWOther LLM

126

Visit Official Site

🟠Anthropic Claude 💎Google Gemini 🤖OpenAI GPT