Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
SGLang is a high-performance serving framework for large language models and multimodal models, now the de facto industry standard deployed on over 400,000 GPUs worldwide. It introduces RadixAttention for KV cache reuse, a zero-overhead CPU scheduler, and compressed finite state machines for faster structured output decoding. SGLang supports a wide range of models including Llama, Qwen, DeepSeek, Kimi, GLM, and diffusion models, and runs across NVIDIA, AMD, Intel, Google TPU, and Ascend NPU hardware.
ollama
The simplest way to run LLMs locally with 165K+ GitHub stars. One-command deployment, 100+ models, REST API, and multi-platform support.
ggml-org
Pure C/C++ LLM inference engine supporting CPUs, Apple Silicon, CUDA, and Vulkan
vLLM Project
A high-throughput, memory-efficient LLM inference and serving engine built around PagedAttention, with an OpenAI-compatible API and 200+ model support.
unslothai
2x faster LLM fine-tuning with 70% less VRAM via custom Triton kernels. Supports Llama, Qwen, DeepSeek, Gemma, and 500+ models.