Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
CodeGraph is an open-source, 100% local code knowledge graph that pre-indexes a project so AI coding agents like Claude Code, Codex CLI, Cursor, and OpenCode can answer architecture questions without scanning files repeatedly. Released under the MIT license by Colby McHenry, the project has climbed to 10,000+ GitHub stars by promising 35% lower agent costs and 70% fewer tool calls on large codebases. It exposes its index as a Model Context Protocol (MCP) server, so any MCP-aware agent can query symbols, call graphs, and impact radius directly instead of spawning expensive exploration sub-agents. ## The Tool-Call Tax Modern coding agents spend a surprising fraction of their budget just orienting themselves in a repository. Every "find this function," "where is this imported," and "who calls this method" question becomes a flurry of `grep`, `find`, and file-read tool calls. On a 200k-line codebase, an agent can burn through 30 to 50 tool invocations and tens of thousands of tokens before it even starts the actual task. CodeGraph attacks this overhead directly by moving discovery work out of the agent loop and into a precomputed graph the agent can query in one shot. ## Pre-Indexed Knowledge Graph The core artifact is a local SQLite database that stores symbols, definitions, references, call edges, and file relationships for the entire codebase. The index is built once with `codegraph init -i` and then maintained automatically by native OS file watchers, so saving a file triggers an incremental update rather than a full rebuild. Because everything lives in SQLite, lookups are millisecond-fast and the entire system runs offline with no API keys or telemetry. The native `better-sqlite3` backend is the recommended path; a WASM fallback exists but is reportedly 5 to 10 times slower. ## MCP Tools Exposed to Agents CodeGraph publishes three primary MCP tools that map cleanly to how agents actually think. `codegraph_search` is a full-text symbol lookup that returns matching definitions across the codebase. `codegraph_context` pulls a target function plus its immediate neighborhood (callers, callees, related types) so the agent can reason without reading entire files. `codegraph_impact` walks the call graph outward to surface every function transitively affected by a change, which is enormously useful for refactor planning. Because these are MCP tools, no custom integration code is required on the agent side, and the same server works for Claude Code, Codex, Cursor, and opencode. ## Nineteen-Plus Language Coverage The parser covers 19+ languages, including TypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Swift, Kotlin, Dart, Svelte, Vue, Liquid, and Pascal/Delphi. Framework-aware routing recognizes idioms in Django, Flask, Express, NestJS, and Rails, so a query for an HTTP route handler returns the actual controller rather than the middleware glue around it. This breadth matters because polyglot monorepos are the worst case for unaided agents and the best case for a tool like CodeGraph. ## Install in One Command Getting started is intentionally minimal. Running `npx @colbymchenry/codegraph` launches an interactive setup that registers the MCP server with whichever agent is detected on the machine. After restarting the agent to pick up the server, `codegraph init -i` inside the project root builds the index in a few seconds to a few minutes depending on repo size. The auto-sync watcher then takes over and the agent immediately starts using `codegraph_*` tools without any prompt changes. ## Limitations The gains are largest on big repositories. On a small project where `grep` already finishes in milliseconds, the index adds setup overhead without dramatic savings. Node.js 20 to 24 is required, and files larger than 1MB are skipped by default along with `node_modules` and similar generated directories. CodeGraph also focuses on structural code understanding rather than runtime behavior; it will not tell an agent which branch is hot in production or how a function performs under load. Finally, because the index is local and unencrypted, teams handling secret-laden code should treat the SQLite database with the same care as the source itself.