Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
slime is an open-source LLM post-training framework built for reinforcement learning at scale. Developed by THUDM (the team behind the GLM models), it connects Megatron for high-performance training with SGLang for fast rollout, giving researchers a single, coherent path for the RL loop instead of a loose stack of disconnected trainers, serving engines, and agent frameworks. With roughly 7,000 GitHub stars, slime has quickly become one of the most battle-tested open RL post-training stacks available. ## Megatron + SGLang in One Loop slime's core idea is that training and data generation should reinforce each other. Megatron handles the main training process while SGLang serves rollouts, and both flow through the same training / rollout / Data Buffer path. By committing to a single rollout backend, slime can use SGLang-specific capabilities — routing, caching, disaggregation, and weight synchronization — directly, rather than flattening multiple inference engines into a lowest-common-denominator abstraction. ## Native Argument Pass-Through A defining design choice is native pass-through. slime reads Megatron arguments directly and exposes every installed SGLang argument with a `--sglang-` prefix, so upstream improvements to either engine remain available without wrapper code. This keeps the framework lightweight and close to the engines it builds on, letting teams adopt new training and serving optimizations as the underlying projects evolve. ## Flexible Data Generation slime treats math, code, search, tools, sandboxes, verifiers, environments, and multi-agent or long-horizon agentic workflows as data-generation or reward workflows that plug in without forking the training kernel. Custom data-generation interfaces and server-based engines make it possible to build arbitrary RL data pipelines, supporting everything from simple verifiable rewards to complex agentic rollouts. ## Battle-Tested at Frontier Scale slime is the RL framework behind the GLM family — GLM-4.5 through GLM-5.2 — and also supports Qwen3 series, DeepSeek V3/V3.1/R1, and Llama 3. Because RL bugs are often silent, the project treats correctness as a first-class concern, maintaining CPU unit tests, contract tests for customization hooks, and GPU end-to-end tests covering dense and MoE models, checkpointing, numerical precision, async rollout, and PPO-style workflows. ## Considerations slime is deliberately opinionated: it optimizes deeply for the Megatron + SGLang path rather than supporting many backends, so teams committed to a different stack may find it less flexible. Large-scale RL post-training also remains resource-intensive and operationally complex. For groups building release-grade models who want explicit dataflow and reproducible RL infrastructure, though, slime is among the most credible open frameworks in the space.