Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
SGLang (Structured Generation Language) is a high-throughput, low-latency inference engine for large language models and multimodal models, developed by the LMSYS team. With 26,600 GitHub stars and over 12,000 commits, it has become the de facto open-source infrastructure standard for LLM deployment in 2026, running across 400,000+ GPUs at organizations including xAI, NVIDIA, AMD, LinkedIn, Google Cloud, and AWS. The framework's flagship innovation is RadixAttention — a prefix caching mechanism that automatically reuses KV cache activations across requests sharing common prefixes (system prompts, RAG context, few-shot examples). This delivers up to 5x faster inference and 6x higher throughput on real-world workloads compared to frameworks without automatic KV cache reuse. In February 2026, SGLang unlocked a 25x inference performance improvement on NVIDIA GB300 NVL72 hardware. Independent 2026 benchmarks consistently rank SGLang at the top of open-source inference engines, delivering approximately 16,200 tokens per second on H100 GPUs — a 29% throughput advantage over vLLM. The framework supports all major model architectures (Llama, Qwen, DeepSeek, GPT variants, diffusion models), runs on NVIDIA (GB200/H100/A100), AMD MI-series, Intel CPUs, Google TPUs, and Huawei Ascend NPUs, and provides OpenAI-compatible APIs for drop-in replacement of existing inference stacks. SGLang also serves as the inference backbone for RL post-training frameworks including verl and Tunix.