Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

codebase-memory-mcp - Open Source | Evermx | Evermx

Back to Open Source

Trending

codebase-memory-mcp

DeusDataMIT

View on GitHub

Agent13.1K Stars954 Forks4 views

codebase-memory-mcp is a high-performance code intelligence engine built specifically for AI coding agents. It full-indexes an average repository in milliseconds and the entire Linux kernel — 28 million lines across 75,000 files — in about three minutes, then answers structural queries in under a millisecond. Distributed as a single static binary for macOS, Linux, and Windows with zero runtime dependencies, it has gathered more than 13,000 GitHub stars under the MIT license and positions itself as the connective memory layer between large codebases and the agents that explore them. ## A Persistent Knowledge Graph for Code The core idea is to turn a codebase into a queryable knowledge graph rather than forcing an agent to read files one at a time. Using tree-sitter AST analysis across all 158 supported languages, the engine extracts functions, classes, call chains, HTTP routes, and cross-service links into a persistent graph. A Hybrid LSP layer adds semantic type resolution for eleven major languages — Python, TypeScript/JavaScript/JSX/TSX, PHP, C#, Go, C, C++, Java, Kotlin, and Rust — so the graph captures real type relationships, not just syntactic structure. The result is a structural map an agent can traverse with a single query instead of dozens of grep-and-read cycles. ## Token Efficiency as the Headline Benefit The project's central claim is dramatic token reduction. In the authors' benchmark, five structural queries consume roughly 3,400 tokens through the graph versus about 412,000 tokens via file-by-file search — on the order of 120x fewer tokens. The accompanying preprint (arXiv:2603.27277) reports evaluation across 31 real-world repositories showing 83% answer quality, 10x fewer tokens, and 2.1x fewer tool calls compared with conventional file exploration. For agents working under tight context budgets, replacing many read operations with one graph query is the difference between staying on-task and running out of context. ## Speed and Architecture Indexing speed comes from a RAM-first pipeline: LZ4 compression, in-memory SQLite, and fused Aho-Corasick pattern matching, with memory released once indexing completes. The 158 language grammars are vendored and compiled directly into the binary, so there is nothing to install and nothing that breaks on a user's machine. Infrastructure-as-code is treated as a first-class citizen too — Dockerfiles, Kubernetes manifests, and Kustomize overlays are indexed as graph nodes with cross-references, letting agents reason about deployment topology alongside application code. ## Built for Agent Workflows via MCP The engine exposes 14 MCP tools spanning search, call tracing, architecture overview, impact analysis, raw Cypher queries, dead-code detection, cross-service HTTP linking, and architectural decision record management. Its install command auto-detects eleven coding agents — Claude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code, OpenClaw, and Kiro — and configures MCP entries, instruction files, and pre-tool hooks for each in one step. An optional UI variant serves an interactive 3D graph visualization at localhost:9749 for humans who want to explore the same graph their agent queries. Optional auto-indexing keeps projects current through git-based change detection. ## Security and Considerations Because the tool reads source and writes to agent configuration files, the maintainers lean heavily on transparency: all processing happens locally, code never leaves the machine, and every release binary is signed, checksummed, and scanned by 70+ antivirus engines, with an OpenSSF Scorecard and SLSA 3 provenance. The main caveats are inherent to its scope — it is an infrastructure component that modifies agent configs, so teams in locked-down environments will want to audit the install step, and its deepest semantic features are limited to the eleven Hybrid LSP languages rather than all 158. For agent-heavy development on large repositories, however, it addresses one of the most expensive problems in the workflow: efficient, structural code understanding.

Key Features

Indexes an average repo in milliseconds and the 28M-LOC Linux kernel in ~3 minutes
Persistent tree-sitter knowledge graph of functions, classes, call chains, and routes
Hybrid LSP semantic type resolution for 11 major languages, AST parsing for all 158
Up to 120x fewer tokens than file-by-file search (benchmarked in arXiv:2603.27277)
14 MCP tools: search, trace, impact analysis, Cypher queries, dead-code detection
One-command install auto-configures 11 coding agents including Claude Code and Codex
Single static zero-dependency binary for macOS, Linux, and Windows
Optional 3D graph visualization UI and git-based auto-indexing