Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
TensorRT-LLM is NVIDIA's open-source library for optimizing Large Language Model inference on NVIDIA GPUs. It provides an easy-to-use Python API to define LLMs and applies state-of-the-art optimizations including custom attention kernels, inflight batching, paged KV caching, and quantization techniques such as FP8, FP4, INT4 AWQ, and INT8 SmoothQuant. Built on top of TensorRT and PyTorch, it delivers industry-leading throughput and latency for production LLM serving with support for MoE models and Blackwell GPU architectures.
ollama
The simplest way to run LLMs locally with 165K+ GitHub stars. One-command deployment, 100+ models, REST API, and multi-platform support.
ggml-org
Pure C/C++ LLM inference engine supporting CPUs, Apple Silicon, CUDA, and Vulkan
sgl-project
High-performance LLM and multimodal model serving framework with RadixAttention and structured generation.
mlc-ai
Universal LLM deployment engine using ML compilation for cloud, mobile, and web