Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
LiteLLM is a Python SDK and self-hosted AI Gateway that lets teams call 100+ LLM providers — OpenAI, Anthropic, Gemini, Bedrock, Azure, VertexAI, Cohere, SageMaker, HuggingFace, vLLM, NVIDIA NIM, and dozens more — through a single OpenAI-compatible interface. With 49,961 GitHub stars under the MIT license, it has become the default abstraction layer for organizations that need to route, meter, observe, and govern LLM traffic across multiple providers without rewriting application code each time a new model ships. ## The Problem It Solves In 2026, no serious AI application talks to only one LLM provider. Cost-sensitive routes go to open-weight models on a self-hosted vLLM instance; high-stakes reasoning goes to Claude Opus or GPT; on-device or fine-tuned workloads go to Bedrock, Azure, or a private endpoint. Without a gateway, every team ends up with provider-specific clients scattered across their codebase, inconsistent error handling, no unified cost view, and brittle fallback logic written ad hoc. LiteLLM solves this with one client call — `completion(model=..., messages=...)` — that maps to whichever provider the model name resolves to, with retries, rate limits, and observability handled centrally rather than per provider. ## OpenAI-Compatible Interface The core design choice is OpenAI compatibility: every request and response follows the OpenAI Chat Completions schema, regardless of whether the underlying provider is Anthropic, Gemini, Bedrock, or a self-hosted vLLM server. This means existing application code written against the OpenAI SDK can be pointed at a LiteLLM endpoint by changing only the base URL, and any framework that already supports OpenAI — LangChain, LlamaIndex, Vercel AI SDK, OpenWebUI, Cursor, Continue, Cline — works against the LiteLLM gateway without modification. Errors are translated into OpenAI-format error responses, which makes downstream error-handling code uniform across providers and lets teams build retry and circuit-breaker logic once. ## Proxy Server: The AI Gateway LiteLLM ships as both a Python library for direct embedding and a self-hosted proxy server that acts as an AI Gateway for an entire organization. The proxy provides virtual key management, where each team, project, or end user gets a key with its own rate limits, model allowlist, and budget. It tracks spend per key, per user, per project, and per model in real time, which is the single most useful feature for any company doing AI cost allocation. An Admin Dashboard UI gives operators visibility into request volume, latency percentiles, error rates by provider, and budget consumption — the same operational surface that internal-platform teams used to build from scratch on top of Datadog dashboards. ## Load Balancing, Fallbacks, and Reliability The routing layer supports multi-deployment load balancing — the same model can be backed by Azure OpenAI East, Azure West, OpenAI direct, and a self-hosted vLLM instance, with traffic distributed by latency, cost, or simple round-robin. Fallback chains let a request that fails on the primary deployment automatically retry on a secondary, which keeps applications running through provider outages without exposing the underlying failure to end users. The proxy reports 8ms P95 routing latency at 1,000 requests per second, which puts it well inside the budget for most production paths where the LLM call itself dominates. ## Observability and Guardrails Observability callbacks ship for Lunary, MLflow, Langfuse, Helicone, Datadog, Prometheus, and OpenTelemetry, so every request, response, latency, and token count can be exported to whatever monitoring stack a team already runs. The guardrails layer integrates with Lakera, Aporia, AIM, PromptLayer, and OpenAI Moderation for content filtering and prompt-injection detection, applied uniformly regardless of the underlying provider. PII redaction and custom guardrails written in Python can be inserted at the request or response stage, which centralizes safety enforcement at the gateway level instead of scattering it across applications. ## Why It Has Become Infrastructure At nearly 50,000 GitHub stars and 8,777 forks, LiteLLM has crossed the threshold from project to infrastructure component. The fork count signals heavy internal adoption — teams running customized versions in production — and the active issue volume (3,485 open issues against a single repository) reflects the breadth of the integration surface it has taken on. The MIT license, OpenAI-compatible API, and clean separation between SDK and proxy server are the three decisions that made this trajectory possible: any team can adopt LiteLLM without licensing friction, port existing OpenAI code in minutes, and decide independently whether they want it as a library or as a deployed gateway. In 2026, when LLM provider counts have grown to dozens and cost allocation has become a real budget line, LiteLLM is the open-source answer to the question of how to keep AI infrastructure portable.