Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
Headroom is an Apache-2.0 licensed Python and TypeScript project by Tejas Chopra that compresses everything an AI agent feeds into an LLM — tool outputs, log files, RAG chunks, source files, and conversation history — before it consumes a context window. With 10,000+ GitHub stars and 660+ forks in its first months on GitHub, it landed at the top of GitHub Trending by attacking a problem every agent developer is now hitting: context windows fill up faster than tasks complete. ## The Compression Strategy Headroom is not a single algorithm. It is a router that dispatches different content types to specialized compressors. A `ContentRouter` inspects the payload and chooses between `SmartCrusher` for JSON and structured data, `CodeCompressor` for AST-aware reduction of source files across multiple languages, and `Kompress-base`, a small HuggingFace model trained on real agent traces, for prose. A `CacheAligner` deliberately stabilizes prefixes so that downstream provider KV cache hit rates improve, which compounds the token savings into latency and cost wins. A reversible compression path called CCR stores originals locally and lets the LLM request the uncompressed version on demand, so information loss is not strictly destructive. The documented token savings on real workloads are: 92% on code search, 92% on incident debugging, 73% on issue triage, and 47% on codebase exploration. The headline range of 60-95% reduction is therefore workload-dependent, not a flat compression ratio. ## Three Deployment Modes Headroom ships as three integrations against the same compression engine. The Python and TypeScript library exposes a direct `compress()` call. The HTTP proxy server is a drop-in shim that requires zero code changes — any agent or framework that talks to a chat completions endpoint can route through it. The MCP server exposes `headroom_compress`, `headroom_retrieve`, and `headroom_stats` as tools that Claude Code, Cursor, and other MCP-capable clients can call directly. This matters because the three modes cover the three real integration patterns developers use. Library mode fits new applications, proxy mode fits existing applications you cannot modify, and MCP mode fits agentic IDEs where the agent itself decides when to compress. ## Benchmark Discipline The project publishes side-by-side numbers on standard evals that are unusually transparent for a compression library. GSM8K math accuracy is preserved exactly (0.870 baseline, 0.870 compressed). TruthfulQA actually improves slightly (+0.030 delta), which suggests the noise reduction effect is real on factual tasks. SQuAD v2 retains 97% of baseline accuracy at 19% compression, and BFCL tool-use benchmarks retain 97% at 32% compression. These are not lossless numbers, but they are honest, and they are exactly the kind of evals an engineer needs to decide whether compression is safe for their workload. ## Ecosystem Compatibility Documented integrations include Claude Code, OpenAI Codex, Cursor, Aider, any OpenAI-compatible client, LangChain, Agno, and Strands. Because the proxy mode is provider-neutral, the project is effectively LLM-vendor agnostic — anything that speaks the OpenAI chat completions wire protocol works. ## Honest Limitations Headroom requires Python 3.10 or newer and runs as a local process, which means it does not work inside fully sandboxed execution environments where spawning a local binary is blocked. The MCP server pattern assumes the client supports MCP, which currently means Claude Code, Cursor, and a small but growing set of agents. Compression itself adds CPU latency on the request path, so for very short tool outputs the overhead can exceed the savings — the library is best used on payloads above a few hundred tokens. Finally, while CCR is technically reversible, the Kompress-base prose model is lossy, and any agent that needs literal exact-text preservation (legal review, audit trails) should route those payloads around the prose compressor. For teams building agentic systems that are starting to feel context-window pressure, Headroom is one of the most thoroughly engineered and benchmarked open compression layers currently available, and the dual-license library/proxy/MCP delivery model makes it unusually easy to adopt incrementally.