Trending

SGLang

sgl-projectApache-2.0

Inference24.9K Stars5.0K Forks117 views

SGLang is a high-performance serving framework for large language models and multimodal models, now the de facto industry standard deployed on over 400,000 GPUs worldwide. It introduces RadixAttention for KV cache reuse, a zero-overhead CPU scheduler, and compressed finite state machines for faster structured output decoding. SGLang supports a wide range of models including Llama, Qwen, DeepSeek, Kimi, GLM, and diffusion models, and runs across NVIDIA, AMD, Intel, Google TPU, and Ascend NPU hardware.

Key Features

RadixAttention for KV cache reuse and prefix caching
Zero-overhead CPU scheduler with prefill-decode disaggregation
Structured output decoding via compressed finite state machines
Multi-hardware support: NVIDIA, AMD, Intel, Google TPU, Ascend NPU
Wide model support: Llama, Qwen, DeepSeek, GLM, diffusion models

Open Source

SGLang

Key Features

Tags

Related Projects

Ollama

llama.cpp

Unsloth

SGLang