Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
Kimi K2 is Moonshot AI's open-weight large language model series built around a mixture-of-experts (MoE) architecture with 1 trillion total parameters and 32 billion activated per token. Released under a modified MIT License with both code and weights published on Hugging Face, K2 targets the same workloads as proprietary frontier models while remaining fully self-hostable on commodity inference stacks. By May 2026 the GitHub repository has crossed 10,700 stars and become one of the most-watched open-weight model releases of the year. ## Architecture Kimi K2 uses a 61-layer transformer with one dense layer and sixty MoE layers. Each MoE layer routes tokens across 384 experts, selecting 8 experts per token, which keeps the activated parameter count near 32B even though total parameters reach 1T. The attention block uses Multi-head Latent Attention (MLA) with 7168-dimensional hidden states, and the feed-forward path uses SwiGLU activations. A 160K-entry vocabulary supports multilingual input efficiently, and the context window is 128K tokens. The model was pre-trained on 15.5 trillion tokens. Moonshot reports the run completed with zero training instabilities thanks to a custom adaptation of the Muon optimizer scaled to this parameter count, which the team treats as one of the headline contributions of the project. Block-FP8 quantized weights are published alongside the BF16 checkpoints so inference frameworks can load the model on a single 8-GPU node. ## Benchmark Results The instruct variant posts competitive scores against closed-source frontier models. On MMLU it reaches 89.5%, on AIME mathematics 69.6%, and on LiveCodeBench 53.7%. The most distinctive results are in agentic workloads: SWE-bench single-attempt sits at 65.8%, and AceBench tool-use evaluation reaches 76.5%. These figures put K2 within range of the closed reference models that were the baseline a few months earlier, while making the weights freely downloadable. ## Deployment Options The official inference paths include vLLM, SGLang, KTransformers, and TensorRT-LLM. Moonshot also runs a hosted API that is wire-compatible with both the OpenAI and Anthropic SDKs, which lowers the integration cost for teams that already use those clients. The repository documents recommended sharding strategies (typically tensor parallelism plus expert parallelism) and provides reference launch commands for each backend. For smaller environments the FP8 weights can be loaded across 8x H100 or 8x H200 GPUs. CPU offload paths via KTransformers are documented for teams that want to experiment without dedicated GPU hardware, although throughput is significantly lower in that mode. ## Tooling and Agentic Use The README places strong emphasis on tool-use and agentic coding scenarios. Example notebooks show K2 driving a shell, browser, and code execution loop, and the model card includes recommended system prompts for SWE-bench-style agentic harnesses. Output formatting was tuned so that JSON and function-call responses parse reliably, which matters for production agents that depend on schema fidelity. ## Use Cases Kimi K2 is positioned for teams that need frontier-tier reasoning quality without sending data to a third-party API. Enterprise deployments that have hard data-residency requirements can run the model entirely inside a private VPC. Research groups can fine-tune or distill the model under the modified MIT terms, which permit commercial use with attribution. Agent platform builders get a model that handles long tool chains and large context windows without dropping into degraded modes. The long context window also makes K2 a reasonable choice for code-base-level analysis tasks, document QA over large corpora, and multi-step planning agents that need to retain conversation state across many turns. ## Limitations Running a 1T-parameter MoE model is not cheap even when only 32B parameters are active per token. An 8x H100 node is effectively the minimum viable inference target for production latency, which is well beyond the budget of most individual developers. The community quantizations that work on smaller hardware trade off measurable quality, particularly on math and code benchmarks. The license is a modified MIT rather than plain MIT or Apache-2.0. The modifications add attribution requirements that some downstream projects will need to review with legal counsel before integrating. The model's multilingual coverage skews toward Chinese and English; non-Latin scripts outside CJK get less attention in the training mix and that shows up in evaluation. Finally, while agentic coding scores are strong, the model still benefits from a well-designed harness and tool description prompt, so out-of-the-box behavior in a naive agent loop is not as good as the headline benchmark numbers suggest. ## Who Should Use Kimi K2 Kimi K2 fits teams that already have GPU infrastructure suitable for hosting frontier-class models and that want an open-weight alternative to GPT-class or Claude-class APIs. It is a strong base model for organizations building proprietary agent systems on top of an LLM they control. Independent researchers benefit from the open weights and detailed architectural disclosures, which make K2 a useful artifact for studying large-scale MoE training and the practical behavior of the Muon optimizer at trillion-parameter scale.