Trending

llama.cpp

ggml-orgMIT

Inference100.9K Stars16.2K Forks148 views

llama.cpp is a pure C/C++ LLM inference engine with no external dependencies, enabling high-performance local inference across a wide range of hardware. It supports Apple Silicon via Metal, NVIDIA CUDA, AMD ROCm, Intel SYCL, Vulkan, and ARM NEON — from Raspberry Pi 5 boards to multi-GPU servers. In March 2026, it crossed 100,000 GitHub stars, making it the fastest open-source AI project to reach that milestone, with over 700 contributors and 3,800+ merged pull requests in 2025 alone.

Key Features

Zero external dependency C/C++ runtime for maximum portability
Broad hardware support: Apple Metal, CUDA, ROCm, SYCL, Vulkan, ARM NEON
GGUF model format with efficient quantization (Q2_K to Q8_0)
OpenAI-compatible HTTP server for drop-in API integration
Active community with 700+ contributors and continuous performance improvements

Open Source

llama.cpp

Key Features

Tags

Related Projects

Ollama

Unsloth

SGLang

SGLang