ChunkHound
Your entire codebase, deeply understood
Multi-hop semantic search. Architecture research. Cited reports. Auto-generated documentation.
100% local · 33 languages · MIT licensed · Free forever
uv tool install chunkhound “We’ve repeatedly demonstrated compressing what used to take 2–3 months down to 2–3 days. Claude Code orchestrates tens to hundreds of deep research calls over our monorepo, zooming into each subsection systematically.” — Ofir Rozenfeld, AI Transformation Product Manager at Applied Materials
AI writes code blind
Your AI agent generates code without understanding your architecture. It doesn't know that the auth middleware chains through three files, that your data model changed six months ago, or that there's already a utility for exactly what it's about to write. ChunkHound gives every AI agent deep, structured understanding of your architecture — so the first attempt is the right one.
THE APPROACH
Three lenses into one semantic index
ChunkHound parses your code with cAST — AST-aware chunking backed by
Carnegie Mellon research (arXiv:2506.15655). It builds a semantic graph that search, research,
and documentation all draw from. Not three separate tools — three lenses into one index.
TOKEN EFFICIENCY
A retry is just a token you didn't have to spend.
When an agent doesn't know your architecture, it guesses. When it guesses wrong, it loops — regenerating on context already polluted by the failed attempt. ChunkHound spends tokens once: a research call that surfaces the auth chain, the utility that already exists, the data model that changed last quarter. Users consistently report the same pattern across codebases of every size.
Without context
With ChunkHound
“I would pay for ChunkHound. The code research is that good.” — @FlatTreNeb, Reddit
CAPABILITIES
Seven capabilities, one semantic index
Multi-Hop Semantic Search
Discovers code through semantic bridges. Three hops find architectural relationships that keyword search misses entirely.
3-hop traversalGap Detection & Filling
Clusters results, identifies missing concepts via LLM analysis, uses elbow detection to filter noise. Finds what you didn't know to ask for.
Elbow detectioncAST Chunking
AST-aware code splitting from Carnegie Mellon research. +4.3 recall on RepoEval, +2.67 pass@1 on SWE-bench vs. naive chunking.
arXiv:2506.15655Depth Exploration
Explores different angles of files already in results. A file with auth logic and session management gets explored for both.
Aspect-based queriesCode Research
One call produces a cited markdown architecture report. Query expansion, gap detection, evidence ledger, map-reduce synthesis.
Cited reportsUnified Semantic + Regex
Extracts symbols from semantic results, runs parallel regex searches, reranks everything against the root query. Two paradigms, unified.
Hybrid searchAuto-Documentation
Generates a searchable documentation site from your codebase. Five comprehensiveness levels from minimal to exhaustive.
5 levels“Chunkhound has been an absolute beast in reading multiple large repos in such extents that would take a human days or the plain claude code setup multiple attempts to get it (at least somewhat) right.” — @flrk, co-founder of Kimara AI
PROVEN AT SCALE
Your laptop outperforms their cluster.
Sourcegraph needs Kubernetes. Augment Code needs their servers. GitHub Copilot's @workspace tops out before your monorepo does. ChunkHound indexes your entire codebase on your own machine — no cloud account, no vendor, no code leaving your network. MIT licensed. Community built. Applied Materials runs 50M+ lines on a developer laptop today.
| The alternatives | ChunkHound | |
|---|---|---|
| Kubernetes cluster | required | not needed |
| H100 / GPU server | required | not needed |
| Vendor subscription | required | MIT licensed |
| Code leaves network | yes | never |
| Closed, proprietary | yes | open source |
50M+ lines · 1 developer laptop · 0 bytes sent · MIT licensed · production today
Connect to your AI agent
echo .chunkhound.json >> .gitignore
cat > .chunkhound.json <<'CHUNKHOUND_EOF'
{
"embedding": {
"provider": "voyageai",
"model": "voyage-3.5",
"api_key": "<YOUR_VOYAGE_API_KEY>"
},
"llm": {
"provider": "anthropic",
"api_key": "<YOUR_ANTHROPIC_API_KEY>"
}
}
CHUNKHOUND_EOF
mkdir -p .cursor
cat > .cursor/mcp.json <<'CHUNKHOUND_EOF'
{
"mcpServers": {
"ChunkHound": {
"command": "chunkhound",
"args": [
"mcp"
]
}
}
}
CHUNKHOUND_EOF
chunkhound index . Local setup
Set up the models before running ChunkHound:
.chunkhound.json holds your API keys
The first command adds it to .gitignore so you
don't commit secrets. Replace the <YOUR_*_API_KEY> placeholders with real keys before running. Local
OpenAI-compatible backends still need an explicit
model. Need Azure OpenAI, a self-hosted endpoint, or a proxy?