Kimi K2.5: Moonshot AI's 1T Parameter Model Brings Agent Swarm to Open Source
Moonshot AI releases Kimi K2.5, a 1 trillion parameter open-source MoE model with 384 experts, native multimodal capabilities, and an Agent Swarm system that coordinates up to 100 parallel sub-agents.
Moonshot AI releases Kimi K2.5, a 1 trillion parameter open-source MoE model with 384 experts, native multimodal capabilities, and an Agent Swarm system that coordinates up to 100 parallel sub-agents.
A Trillion Parameters, Open and Agentic
Moonshot AI released Kimi K2.5 on January 27, 2026, delivering what may be the most architecturally ambitious open-source language model to date. With 1 trillion total parameters, 384 Mixture-of-Experts, and a native multimodal architecture trained on 15 trillion tokens of mixed visual and text data, K2.5 challenges the assumption that frontier-class capabilities require closed, proprietary development.
The model's defining feature is Agent Swarm, a system that allows a single K2.5 instance to dynamically spawn and coordinate up to 100 specialized sub-agents working in parallel. Each sub-agent operates independently with its own tool access, enabling complex multi-step workflows that previously required custom orchestration frameworks. This is not a research demo. Agent Swarm is available through Moonshot's API today.
Architecture: Sparse Efficiency at Scale
Kimi K2.5 employs a Mixture-of-Experts architecture with 384 experts distributed across 61 layers, including one dense layer. Despite its 1 trillion total parameters, the model activates only 32 billion parameters per token by selecting 8 experts plus 1 shared expert for each forward pass. This gives it a 3.2 percent activation rate, meaning the model uses a fraction of its total capacity for any given input while maintaining the knowledge breadth of a much larger model.
| Specification | Value |
|---|---|
| Total Parameters | 1 Trillion |
| Active Parameters | 32B per token |
| Number of Experts | 384 |
| Selected Experts per Token | 8 + 1 shared |
| Layers | 61 (1 dense) |
| Context Length | 256K tokens |
| Vision Encoder | MoonViT (400M params) |
| Vocabulary Size | 160K |
| Attention Mechanism | Multi-head Latent Attention (MLA) |
The MoonViT vision encoder, a 400M parameter component, is integrated at the pre-training level rather than bolted on as a post-training adapter. This native multimodal architecture means K2.5 processes images and video with the same fluency as text, rather than treating visual inputs as translated text descriptions.
Benchmark Performance
Kimi K2.5's benchmark results place it among the top-performing open-source models across multiple domains.
In coding, K2.5 achieves a 76.8 percent score on SWE-bench Verified, making it the strongest open-source model on this widely tracked software engineering benchmark. It also scores 73.0 on SWE-bench Multilingual and 85.0 on LiveCodeBench, demonstrating consistent coding strength across different evaluation frameworks.
In reasoning and knowledge tasks, the model scores 96.1 on AIME 2025, 87.6 on GPQA-Diamond, and 87.1 on MMLU-Pro. These numbers place it competitive with closed-source frontier models on academic reasoning benchmarks.
The multimodal benchmarks reveal the depth of its visual understanding. K2.5 scores 78.5 on MMMU-Pro, 92.3 on OCRBench for text recognition in images, and 87.4 on VideoMME for video comprehension. These results suggest that the native multimodal pre-training approach yields genuine visual reasoning capabilities rather than superficial image-to-text translation.
Agent Swarm: Coordinated Multi-Agent Execution
Agent Swarm is the most distinctive capability in K2.5 and represents a fundamentally different approach to AI agent architecture. Rather than running a single model instance that processes tasks sequentially, Agent Swarm allows K2.5 to dynamically instantiate up to 100 specialized sub-agents that work in parallel.
Each sub-agent receives a specific role and can independently use tools such as web search, code execution, file manipulation, and API calls. The primary K2.5 instance acts as an orchestrator, decomposing complex tasks into subtasks, assigning them to appropriate sub-agents, and synthesizing results.
Moonshot AI reports that Agent Swarm reduces execution time by up to 4.5 times for large-scale research tasks, long-form writing, and batch operations compared to sequential single-agent execution. The benchmarks support this claim: K2.5 with Agent Swarm scores 78.4 on BrowseComp and 79.0 on WideSearch, both agentic search benchmarks that measure the ability to find and synthesize information across multiple sources.
Practical applications include:
- Research synthesis: Assigning different sub-agents to search for information on different aspects of a topic, then combining findings into a coherent report
- Batch processing: Running independent analysis tasks across dozens of documents simultaneously
- Multi-source verification: Dispatching sub-agents to cross-reference claims across multiple databases and websites
Visual Coding: From Design to Implementation
K2.5's native multimodal training enables a capability Moonshot calls Visual Coding. Users can provide UI designs, screenshots, wireframes, or even video demonstrations, and K2.5 generates functional front-end code that reproduces the visual design.
This goes beyond simple image-to-code translation. K2.5 can interpret design intent, infer responsive layout behavior, and generate animations from video references. The workflow supports complete website generation from natural language descriptions combined with visual references, making it a practical tool for rapid prototyping.
Modes of Operation
K2.5 supports multiple operational modes through a single model:
| Mode | Purpose | Use Case |
|---|---|---|
| Instant | Fast responses without reasoning chains | Quick questions, simple tasks |
| Thinking | Extended reasoning with step-by-step analysis | Complex problems, math, coding |
| Agent | Single-agent with tool use | Research, structured content |
| Agent Swarm | Multi-agent parallel execution | Large-scale projects, batch tasks |
The Thinking mode uses configurable reasoning depth, allowing users to balance response quality against latency. In Instant mode, the model responds without generating reasoning traces, providing faster responses for straightforward queries.
Availability and Access
Kimi K2.5 is released under a Modified MIT License, making both the code and model weights available for commercial use. The model can be accessed through:
- Moonshot API: OpenAI and Anthropic-compatible endpoints at platform.moonshot.ai
- HuggingFace: Full model weights available at huggingface.co/moonshotai/Kimi-K2.5
- Kimi Chat: Consumer interface at kimi.com
- Kimi Code: Dedicated coding product
For self-hosted deployment, Moonshot recommends vLLM, SGLang, or KTransformers as inference engines. The 1 trillion parameter model requires significant hardware for full deployment, but the 32B active parameter count means inference costs are comparable to other models of similar active size.
A free tier is available with usage limits, and paid plans offer higher capacity for production deployments.
Competitive Positioning
Kimi K2.5 occupies a unique position in the current AI landscape. Its 76.8 percent SWE-bench score makes it the top-performing open-source model, ahead of DeepSeek V3 and Llama 4 Maverick. Against closed-source models, it trails Claude Opus 4.6 (80.8 percent) and Claude Sonnet 4.6 (79.2 percent) but outperforms GPT-5.2 (69 percent) on this benchmark.
The Agent Swarm capability has no direct equivalent in other open-source models. While frameworks like LangChain and AutoGen enable multi-agent orchestration, K2.5 implements this at the model level, eliminating the need for external orchestration infrastructure.
Limitations
Despite its impressive specifications, K2.5 has notable constraints. The 256K context window, while generous, falls short of Claude's 1 million token context and the 10 million token window offered by Llama 4 Scout. For workflows requiring extremely long context, this could be a limiting factor.
The 1 trillion parameter model is demanding to self-host. Organizations wanting to run K2.5 on their own infrastructure need substantial GPU resources, even with the efficient MoE architecture. This may limit adoption to well-resourced organizations or API-based usage.
Agent Swarm, while powerful, is still in beta. Coordinating 100 parallel sub-agents introduces complexity in error handling, result consistency, and cost management. Production deployments should expect some iteration before achieving reliable multi-agent workflows.
Conclusion
Kimi K2.5 represents a significant milestone in open-source AI. The combination of 1 trillion parameters, native multimodal capabilities, and the Agent Swarm system creates a model that is not merely competitive with closed-source alternatives but offers capabilities that most proprietary models lack. For developers and organizations seeking an open-weight model with frontier-class performance and built-in multi-agent orchestration, K2.5 sets a new standard. The modified MIT license ensures commercial viability, and the OpenAI-compatible API makes integration straightforward for teams already working with existing AI infrastructure.
Pros
- Top-performing open-source model on SWE-bench Verified at 76.8%, with strong results across reasoning and multimodal benchmarks
- Agent Swarm provides built-in multi-agent orchestration without external frameworks like LangChain or AutoGen
- Modified MIT License enables commercial use with minimal restrictions on both code and weights
- Native multimodal architecture handles text, images, and video with consistent quality across all modalities
- OpenAI-compatible API makes adoption straightforward for teams with existing AI infrastructure
Cons
- 256K context window is significantly shorter than Claude's 1M tokens or Llama 4 Scout's 10M tokens
- 1 trillion parameter model requires substantial GPU resources for self-hosted deployment despite sparse activation
- Agent Swarm remains in beta with potential challenges in error handling and cost management at scale
- Moonshot AI is a relatively newer company compared to established players, raising questions about long-term model support
References
Comments0
Key Features
Kimi K2.5 is a 1 trillion parameter open-source MoE model from Moonshot AI with 384 experts and 32B active parameters per token. It features native multimodal pre-training on 15 trillion tokens, a 256K context window, and the Agent Swarm system that coordinates up to 100 parallel sub-agents for 4.5x faster task execution. It achieves 76.8% on SWE-bench Verified (top open-source), 96.1 on AIME 2025, and supports Visual Coding from designs to functional code. Released under Modified MIT License.
Key Insights
- At 1 trillion total parameters with only 3.2% activation per token, K2.5 demonstrates that sparse MoE architectures can deliver frontier performance with practical inference costs
- Agent Swarm enables up to 100 parallel sub-agents with independent tool access, a capability not available in any other open-source model
- The 76.8% SWE-bench Verified score makes K2.5 the strongest open-source coding model, surpassing DeepSeek V3 and Llama 4 Maverick
- Native multimodal pre-training on 15 trillion mixed tokens produces genuine visual reasoning rather than superficial image-to-text translation
- MoonViT 400M parameter vision encoder is integrated at pre-training rather than post-training, enabling seamless cross-modal reasoning
- The Modified MIT License for both code and weights enables unrestricted commercial deployment
- OpenAI and Anthropic-compatible API endpoints lower the integration barrier for existing AI application developers
- Agent Swarm reduces execution time by up to 4.5x for large-scale research and batch processing tasks
Was this review helpful?
Share
Related AI Reviews
DeepSeek V4 Multimodal Launch Imminent: Text, Image, and Video in One Open Model
DeepSeek V4 is expected in the first week of March 2026 as a unified multimodal system generating text, images, and video—far beyond the coding-focused V4 details disclosed in February.
Mistral AI and Accenture Partner to Bring Sovereign AI to Global Enterprises
Mistral AI and Accenture announce a multi-year deal to co-develop enterprise AI solutions emphasizing data sovereignty, with Accenture also becoming a Mistral customer.
Liquid AI LFM2-24B-A2B: A Hybrid Architecture That Fits 24B Parameters in 32GB RAM
Liquid AI releases LFM2-24B-A2B, a sparse MoE model blending gated convolutions with attention that hits 26.8K tokens per second on a single H100 while fitting on consumer hardware.
Cohere Tiny Aya: A 3.35B Model That Speaks 70+ Languages Without the Cloud
Cohere launches Tiny Aya, an open-weight family of 3.35B parameter multilingual models with regional variants covering 70+ languages, designed to run on laptops without internet connectivity.
