Feb 18, 2026

Other LLM

Kimi K2.5: Moonshot AI's 1T Parameter Model Brings Agent Swarm to Open Source

Moonshot AI releases Kimi K2.5, a 1 trillion parameter open-source MoE model with 384 experts, native multimodal capabilities, and an Agent Swarm system that coordinates up to 100 parallel sub-agents.

#Kimi K2.5#Moonshot AI#Agent Swarm#MoE#Open Source LLM

Kimi K2.5: Moonshot AI's 1T Parameter Model Brings Agent Swarm to Open Source

AI Summary

A Trillion Parameters, Open and Agentic

Moonshot AI released Kimi K2.5 on January 27, 2026, delivering what may be the most architecturally ambitious open-source language model to date. With 1 trillion total parameters, 384 Mixture-of-Experts, and a native multimodal architecture trained on 15 trillion tokens of mixed visual and text data, K2.5 challenges the assumption that frontier-class capabilities require closed, proprietary development.

The model's defining feature is Agent Swarm, a system that allows a single K2.5 instance to dynamically spawn and coordinate up to 100 specialized sub-agents working in parallel. Each sub-agent operates independently with its own tool access, enabling complex multi-step workflows that previously required custom orchestration frameworks. This is not a research demo. Agent Swarm is available through Moonshot's API today.

Architecture: Sparse Efficiency at Scale

Kimi K2.5 employs a Mixture-of-Experts architecture with 384 experts distributed across 61 layers, including one dense layer. Despite its 1 trillion total parameters, the model activates only 32 billion parameters per token by selecting 8 experts plus 1 shared expert for each forward pass. This gives it a 3.2 percent activation rate, meaning the model uses a fraction of its total capacity for any given input while maintaining the knowledge breadth of a much larger model.

Specification	Value
Total Parameters	1 Trillion
Active Parameters	32B per token
Number of Experts	384
Selected Experts per Token	8 + 1 shared
Layers	61 (1 dense)
Context Length	256K tokens
Vision Encoder	MoonViT (400M params)
Vocabulary Size	160K
Attention Mechanism	Multi-head Latent Attention (MLA)

The MoonViT vision encoder, a 400M parameter component, is integrated at the pre-training level rather than bolted on as a post-training adapter. This native multimodal architecture means K2.5 processes images and video with the same fluency as text, rather than treating visual inputs as translated text descriptions.

Benchmark Performance

Kimi K2.5's benchmark results place it among the top-performing open-source models across multiple domains.

In coding, K2.5 achieves a 76.8 percent score on SWE-bench Verified, making it the strongest open-source model on this widely tracked software engineering benchmark. It also scores 73.0 on SWE-bench Multilingual and 85.0 on LiveCodeBench, demonstrating consistent coding strength across different evaluation frameworks.

In reasoning and knowledge tasks, the model scores 96.1 on AIME 2025, 87.6 on GPQA-Diamond, and 87.1 on MMLU-Pro. These numbers place it competitive with closed-source frontier models on academic reasoning benchmarks.

The multimodal benchmarks reveal the depth of its visual understanding. K2.5 scores 78.5 on MMMU-Pro, 92.3 on OCRBench for text recognition in images, and 87.4 on VideoMME for video comprehension. These results suggest that the native multimodal pre-training approach yields genuine visual reasoning capabilities rather than superficial image-to-text translation.

Agent Swarm: Coordinated Multi-Agent Execution

Agent Swarm is the most distinctive capability in K2.5 and represents a fundamentally different approach to AI agent architecture. Rather than running a single model instance that processes tasks sequentially, Agent Swarm allows K2.5 to dynamically instantiate up to 100 specialized sub-agents that work in parallel.

Each sub-agent receives a specific role and can independently use tools such as web search, code execution, file manipulation, and API calls. The primary K2.5 instance acts as an orchestrator, decomposing complex tasks into subtasks, assigning them to appropriate sub-agents, and synthesizing results.

Moonshot AI reports that Agent Swarm reduces execution time by up to 4.5 times for large-scale research tasks, long-form writing, and batch operations compared to sequential single-agent execution. The benchmarks support this claim: K2.5 with Agent Swarm scores 78.4 on BrowseComp and 79.0 on WideSearch, both agentic search benchmarks that measure the ability to find and synthesize information across multiple sources.

Practical applications include:

Research synthesis: Assigning different sub-agents to search for information on different aspects of a topic, then combining findings into a coherent report
Batch processing: Running independent analysis tasks across dozens of documents simultaneously
Multi-source verification: Dispatching sub-agents to cross-reference claims across multiple databases and websites

Visual Coding: From Design to Implementation

K2.5's native multimodal training enables a capability Moonshot calls Visual Coding. Users can provide UI designs, screenshots, wireframes, or even video demonstrations, and K2.5 generates functional front-end code that reproduces the visual design.

This goes beyond simple image-to-code translation. K2.5 can interpret design intent, infer responsive layout behavior, and generate animations from video references. The workflow supports complete website generation from natural language descriptions combined with visual references, making it a practical tool for rapid prototyping.

Modes of Operation

K2.5 supports multiple operational modes through a single model:

Mode	Purpose	Use Case
Instant	Fast responses without reasoning chains	Quick questions, simple tasks
Thinking	Extended reasoning with step-by-step analysis	Complex problems, math, coding
Agent	Single-agent with tool use	Research, structured content
Agent Swarm	Multi-agent parallel execution	Large-scale projects, batch tasks

The Thinking mode uses configurable reasoning depth, allowing users to balance response quality against latency. In Instant mode, the model responds without generating reasoning traces, providing faster responses for straightforward queries.

Availability and Access

Kimi K2.5 is released under a Modified MIT License, making both the code and model weights available for commercial use. The model can be accessed through:

Moonshot API: OpenAI and Anthropic-compatible endpoints at platform.moonshot.ai
HuggingFace: Full model weights available at huggingface.co/moonshotai/Kimi-K2.5
Kimi Chat: Consumer interface at kimi.com
Kimi Code: Dedicated coding product

For self-hosted deployment, Moonshot recommends vLLM, SGLang, or KTransformers as inference engines. The 1 trillion parameter model requires significant hardware for full deployment, but the 32B active parameter count means inference costs are comparable to other models of similar active size.

A free tier is available with usage limits, and paid plans offer higher capacity for production deployments.

Competitive Positioning

Kimi K2.5 occupies a unique position in the current AI landscape. Its 76.8 percent SWE-bench score makes it the top-performing open-source model, ahead of DeepSeek V3 and Llama 4 Maverick. Against closed-source models, it trails Claude Opus 4.6 (80.8 percent) and Claude Sonnet 4.6 (79.2 percent) but outperforms GPT-5.2 (69 percent) on this benchmark.

The Agent Swarm capability has no direct equivalent in other open-source models. While frameworks like LangChain and AutoGen enable multi-agent orchestration, K2.5 implements this at the model level, eliminating the need for external orchestration infrastructure.

Limitations

Despite its impressive specifications, K2.5 has notable constraints. The 256K context window, while generous, falls short of Claude's 1 million token context and the 10 million token window offered by Llama 4 Scout. For workflows requiring extremely long context, this could be a limiting factor.

The 1 trillion parameter model is demanding to self-host. Organizations wanting to run K2.5 on their own infrastructure need substantial GPU resources, even with the efficient MoE architecture. This may limit adoption to well-resourced organizations or API-based usage.

Agent Swarm, while powerful, is still in beta. Coordinating 100 parallel sub-agents introduces complexity in error handling, result consistency, and cost management. Production deployments should expect some iteration before achieving reliable multi-agent workflows.

Conclusion

Kimi K2.5 represents a significant milestone in open-source AI. The combination of 1 trillion parameters, native multimodal capabilities, and the Agent Swarm system creates a model that is not merely competitive with closed-source alternatives but offers capabilities that most proprietary models lack. For developers and organizations seeking an open-weight model with frontier-class performance and built-in multi-agent orchestration, K2.5 sets a new standard. The modified MIT license ensures commercial viability, and the OpenAI-compatible API makes integration straightforward for teams already working with existing AI infrastructure.

Editor's Verdict

Kimi K2.5: Moonshot AI's 1T Parameter Model Brings Agent Swarm to Open Source earns a solid recommendation within the other llm space.

The strongest case for paying attention is top-performing open-source model on SWE-bench Verified at 76.8%, with strong results across reasoning and multimodal benchmarks, which raises the bar for what readers should now expect from peers in this space. Reinforcing that, agent Swarm provides built-in multi-agent orchestration without external frameworks like LangChain or AutoGen adds practical value rather than just headline appeal. The broader signal worth registering is straightforward: at 1 trillion total parameters with only 3.2% activation per token, K2.5 demonstrates that sparse MoE architectures can deliver frontier performance with practical inference costs. On the other side of the ledger, 256K context window is significantly shorter than Claude's 1M tokens or Llama 4 Scout's 10M tokens is a real constraint, not a marketing footnote, and it should factor into any serious decision. Layered on top of that, 1 trillion parameter model requires substantial GPU resources for self-hosted deployment despite sparse activation narrows the set of teams for whom this is an obvious yes.

For multi-model deployment teams, cost-conscious operators, and developers willing to evaluate beyond the major labs, this is a serious evaluation candidate, not just a curiosity to bookmark. For everyone else, the safer posture is to monitor coverage and revisit once the use cases that matter to your team are demonstrated in the wild.

Pros

Top-performing open-source model on SWE-bench Verified at 76.8%, with strong results across reasoning and multimodal benchmarks
Agent Swarm provides built-in multi-agent orchestration without external frameworks like LangChain or AutoGen
Modified MIT License enables commercial use with minimal restrictions on both code and weights
Native multimodal architecture handles text, images, and video with consistent quality across all modalities
OpenAI-compatible API makes adoption straightforward for teams with existing AI infrastructure

Cons

256K context window is significantly shorter than Claude's 1M tokens or Llama 4 Scout's 10M tokens
1 trillion parameter model requires substantial GPU resources for self-hosted deployment despite sparse activation
Agent Swarm remains in beta with potential challenges in error handling and cost management at scale
Moonshot AI is a relatively newer company compared to established players, raising questions about long-term model support

References

Kimi K2.5: The Next Milestone in Multimodal and Agentic AI - Medium moonshotai/Kimi-K2.5 - HuggingFace Model Card Moonshot AI's Kimi K2.5 Expands What Open-Weight Models Can Do - AIwire Kimi K2.5: Complete Guide to Moonshot's AI Model - Codecademy Kimi K2.5 Open Visual Agentic Model - Moonshot AI

Comments0

Key Features

Kimi K2.5 is a 1 trillion parameter open-source MoE model from Moonshot AI with 384 experts and 32B active parameters per token. It features native multimodal pre-training on 15 trillion tokens, a 256K context window, and the Agent Swarm system that coordinates up to 100 parallel sub-agents for 4.5x faster task execution. It achieves 76.8% on SWE-bench Verified (top open-source), 96.1 on AIME 2025, and supports Visual Coding from designs to functional code. Released under Modified MIT License.

Key Insights

At 1 trillion total parameters with only 3.2% activation per token, K2.5 demonstrates that sparse MoE architectures can deliver frontier performance with practical inference costs
Agent Swarm enables up to 100 parallel sub-agents with independent tool access, a capability not available in any other open-source model
The 76.8% SWE-bench Verified score makes K2.5 the strongest open-source coding model, surpassing DeepSeek V3 and Llama 4 Maverick
Native multimodal pre-training on 15 trillion mixed tokens produces genuine visual reasoning rather than superficial image-to-text translation
MoonViT 400M parameter vision encoder is integrated at pre-training rather than post-training, enabling seamless cross-modal reasoning
The Modified MIT License for both code and weights enables unrestricted commercial deployment
OpenAI and Anthropic-compatible API endpoints lower the integration barrier for existing AI application developers
Agent Swarm reduces execution time by up to 4.5x for large-scale research and batch processing tasks

Was this review helpful?

Twitter/X

Related AI Reviews

Grok 4.5 Launch: xAI and Cursor's First Joint Model Targets Legal, Finance