Back to list
Feb 18, 2026
99
0
0
Other LLM

Kimi K2.5: Moonshot AI's 1T Parameter Model Brings Agent Swarm to Open Source

Moonshot AI releases Kimi K2.5, a 1 trillion parameter open-source MoE model with 384 experts, native multimodal capabilities, and an Agent Swarm system that coordinates up to 100 parallel sub-agents.

#Kimi K2.5#Moonshot AI#Agent Swarm#MoE#Open Source LLM
Kimi K2.5: Moonshot AI's 1T Parameter Model Brings Agent Swarm to Open Source
AI Summary

Moonshot AI releases Kimi K2.5, a 1 trillion parameter open-source MoE model with 384 experts, native multimodal capabilities, and an Agent Swarm system that coordinates up to 100 parallel sub-agents.

A Trillion Parameters, Open and Agentic

Moonshot AI released Kimi K2.5 on January 27, 2026, delivering what may be the most architecturally ambitious open-source language model to date. With 1 trillion total parameters, 384 Mixture-of-Experts, and a native multimodal architecture trained on 15 trillion tokens of mixed visual and text data, K2.5 challenges the assumption that frontier-class capabilities require closed, proprietary development.

The model's defining feature is Agent Swarm, a system that allows a single K2.5 instance to dynamically spawn and coordinate up to 100 specialized sub-agents working in parallel. Each sub-agent operates independently with its own tool access, enabling complex multi-step workflows that previously required custom orchestration frameworks. This is not a research demo. Agent Swarm is available through Moonshot's API today.

Architecture: Sparse Efficiency at Scale

Kimi K2.5 employs a Mixture-of-Experts architecture with 384 experts distributed across 61 layers, including one dense layer. Despite its 1 trillion total parameters, the model activates only 32 billion parameters per token by selecting 8 experts plus 1 shared expert for each forward pass. This gives it a 3.2 percent activation rate, meaning the model uses a fraction of its total capacity for any given input while maintaining the knowledge breadth of a much larger model.

SpecificationValue
Total Parameters1 Trillion
Active Parameters32B per token
Number of Experts384
Selected Experts per Token8 + 1 shared
Layers61 (1 dense)
Context Length256K tokens
Vision EncoderMoonViT (400M params)
Vocabulary Size160K
Attention MechanismMulti-head Latent Attention (MLA)

The MoonViT vision encoder, a 400M parameter component, is integrated at the pre-training level rather than bolted on as a post-training adapter. This native multimodal architecture means K2.5 processes images and video with the same fluency as text, rather than treating visual inputs as translated text descriptions.

Benchmark Performance

Kimi K2.5's benchmark results place it among the top-performing open-source models across multiple domains.

In coding, K2.5 achieves a 76.8 percent score on SWE-bench Verified, making it the strongest open-source model on this widely tracked software engineering benchmark. It also scores 73.0 on SWE-bench Multilingual and 85.0 on LiveCodeBench, demonstrating consistent coding strength across different evaluation frameworks.

In reasoning and knowledge tasks, the model scores 96.1 on AIME 2025, 87.6 on GPQA-Diamond, and 87.1 on MMLU-Pro. These numbers place it competitive with closed-source frontier models on academic reasoning benchmarks.

The multimodal benchmarks reveal the depth of its visual understanding. K2.5 scores 78.5 on MMMU-Pro, 92.3 on OCRBench for text recognition in images, and 87.4 on VideoMME for video comprehension. These results suggest that the native multimodal pre-training approach yields genuine visual reasoning capabilities rather than superficial image-to-text translation.

Agent Swarm: Coordinated Multi-Agent Execution

Agent Swarm is the most distinctive capability in K2.5 and represents a fundamentally different approach to AI agent architecture. Rather than running a single model instance that processes tasks sequentially, Agent Swarm allows K2.5 to dynamically instantiate up to 100 specialized sub-agents that work in parallel.

Each sub-agent receives a specific role and can independently use tools such as web search, code execution, file manipulation, and API calls. The primary K2.5 instance acts as an orchestrator, decomposing complex tasks into subtasks, assigning them to appropriate sub-agents, and synthesizing results.

Moonshot AI reports that Agent Swarm reduces execution time by up to 4.5 times for large-scale research tasks, long-form writing, and batch operations compared to sequential single-agent execution. The benchmarks support this claim: K2.5 with Agent Swarm scores 78.4 on BrowseComp and 79.0 on WideSearch, both agentic search benchmarks that measure the ability to find and synthesize information across multiple sources.

Practical applications include:

  • Research synthesis: Assigning different sub-agents to search for information on different aspects of a topic, then combining findings into a coherent report
  • Batch processing: Running independent analysis tasks across dozens of documents simultaneously
  • Multi-source verification: Dispatching sub-agents to cross-reference claims across multiple databases and websites

Visual Coding: From Design to Implementation

K2.5's native multimodal training enables a capability Moonshot calls Visual Coding. Users can provide UI designs, screenshots, wireframes, or even video demonstrations, and K2.5 generates functional front-end code that reproduces the visual design.

This goes beyond simple image-to-code translation. K2.5 can interpret design intent, infer responsive layout behavior, and generate animations from video references. The workflow supports complete website generation from natural language descriptions combined with visual references, making it a practical tool for rapid prototyping.

Modes of Operation

K2.5 supports multiple operational modes through a single model:

ModePurposeUse Case
InstantFast responses without reasoning chainsQuick questions, simple tasks
ThinkingExtended reasoning with step-by-step analysisComplex problems, math, coding
AgentSingle-agent with tool useResearch, structured content
Agent SwarmMulti-agent parallel executionLarge-scale projects, batch tasks

The Thinking mode uses configurable reasoning depth, allowing users to balance response quality against latency. In Instant mode, the model responds without generating reasoning traces, providing faster responses for straightforward queries.

Availability and Access

Kimi K2.5 is released under a Modified MIT License, making both the code and model weights available for commercial use. The model can be accessed through:

  • Moonshot API: OpenAI and Anthropic-compatible endpoints at platform.moonshot.ai
  • HuggingFace: Full model weights available at huggingface.co/moonshotai/Kimi-K2.5
  • Kimi Chat: Consumer interface at kimi.com
  • Kimi Code: Dedicated coding product

For self-hosted deployment, Moonshot recommends vLLM, SGLang, or KTransformers as inference engines. The 1 trillion parameter model requires significant hardware for full deployment, but the 32B active parameter count means inference costs are comparable to other models of similar active size.

A free tier is available with usage limits, and paid plans offer higher capacity for production deployments.

Competitive Positioning

Kimi K2.5 occupies a unique position in the current AI landscape. Its 76.8 percent SWE-bench score makes it the top-performing open-source model, ahead of DeepSeek V3 and Llama 4 Maverick. Against closed-source models, it trails Claude Opus 4.6 (80.8 percent) and Claude Sonnet 4.6 (79.2 percent) but outperforms GPT-5.2 (69 percent) on this benchmark.

The Agent Swarm capability has no direct equivalent in other open-source models. While frameworks like LangChain and AutoGen enable multi-agent orchestration, K2.5 implements this at the model level, eliminating the need for external orchestration infrastructure.

Limitations

Despite its impressive specifications, K2.5 has notable constraints. The 256K context window, while generous, falls short of Claude's 1 million token context and the 10 million token window offered by Llama 4 Scout. For workflows requiring extremely long context, this could be a limiting factor.

The 1 trillion parameter model is demanding to self-host. Organizations wanting to run K2.5 on their own infrastructure need substantial GPU resources, even with the efficient MoE architecture. This may limit adoption to well-resourced organizations or API-based usage.

Agent Swarm, while powerful, is still in beta. Coordinating 100 parallel sub-agents introduces complexity in error handling, result consistency, and cost management. Production deployments should expect some iteration before achieving reliable multi-agent workflows.

Conclusion

Kimi K2.5 represents a significant milestone in open-source AI. The combination of 1 trillion parameters, native multimodal capabilities, and the Agent Swarm system creates a model that is not merely competitive with closed-source alternatives but offers capabilities that most proprietary models lack. For developers and organizations seeking an open-weight model with frontier-class performance and built-in multi-agent orchestration, K2.5 sets a new standard. The modified MIT license ensures commercial viability, and the OpenAI-compatible API makes integration straightforward for teams already working with existing AI infrastructure.

Pros

  • Top-performing open-source model on SWE-bench Verified at 76.8%, with strong results across reasoning and multimodal benchmarks
  • Agent Swarm provides built-in multi-agent orchestration without external frameworks like LangChain or AutoGen
  • Modified MIT License enables commercial use with minimal restrictions on both code and weights
  • Native multimodal architecture handles text, images, and video with consistent quality across all modalities
  • OpenAI-compatible API makes adoption straightforward for teams with existing AI infrastructure

Cons

  • 256K context window is significantly shorter than Claude's 1M tokens or Llama 4 Scout's 10M tokens
  • 1 trillion parameter model requires substantial GPU resources for self-hosted deployment despite sparse activation
  • Agent Swarm remains in beta with potential challenges in error handling and cost management at scale
  • Moonshot AI is a relatively newer company compared to established players, raising questions about long-term model support

Comments0

Key Features

Kimi K2.5 is a 1 trillion parameter open-source MoE model from Moonshot AI with 384 experts and 32B active parameters per token. It features native multimodal pre-training on 15 trillion tokens, a 256K context window, and the Agent Swarm system that coordinates up to 100 parallel sub-agents for 4.5x faster task execution. It achieves 76.8% on SWE-bench Verified (top open-source), 96.1 on AIME 2025, and supports Visual Coding from designs to functional code. Released under Modified MIT License.

Key Insights

  • At 1 trillion total parameters with only 3.2% activation per token, K2.5 demonstrates that sparse MoE architectures can deliver frontier performance with practical inference costs
  • Agent Swarm enables up to 100 parallel sub-agents with independent tool access, a capability not available in any other open-source model
  • The 76.8% SWE-bench Verified score makes K2.5 the strongest open-source coding model, surpassing DeepSeek V3 and Llama 4 Maverick
  • Native multimodal pre-training on 15 trillion mixed tokens produces genuine visual reasoning rather than superficial image-to-text translation
  • MoonViT 400M parameter vision encoder is integrated at pre-training rather than post-training, enabling seamless cross-modal reasoning
  • The Modified MIT License for both code and weights enables unrestricted commercial deployment
  • OpenAI and Anthropic-compatible API endpoints lower the integration barrier for existing AI application developers
  • Agent Swarm reduces execution time by up to 4.5x for large-scale research and batch processing tasks

Was this review helpful?

Share

Twitter/X