ChunkHound — Your engineering context, deeply understood

Your entire engineering context, deeply understood

Open-source codebase intelligence that gives agents and teams cited context across current code, git history, and technical web research.

Local-first · Dozens of languages & file types · Cited answers · Git history research · Pinpoint web research

uv tool install chunkhound

“We’ve repeatedly demonstrated compressing what used to take 2–3 months down to 2–3 days. Claude Code orchestrates tens to hundreds of deep research calls over our monorepo, zooming into each subsection systematically.” — Ofir Rozenfeld, AI Transformation Product Manager at Applied Materials

AI writes code blind

Agents can generate code, but they still miss the context that makes software safe to change: how behavior flows across files, what changed across a branch or release, and which external constraints matter. Reviewers, support, and product teams hit the same wall when large PRs, merge conflicts, bugs, and release notes need implementation-backed explanation instead of guesses. ChunkHound turns current code, git history, and technical web research into cited context before anyone edits, reviews, debugs, or explains software.

USE CASES

Deep understanding, applied to real work

The same cited context supports the jobs where missing context hurts most: editing code, understanding changes, debugging incidents, and reconciling implementation with external docs.

Research before editing

Give agents cited architecture context, relevant files, recent changes, and external constraints before they generate code.

Try: Ask: "How does auth work?" then "What changed in the last 20 commits?"

Understand large PRs and releases

Turn branch diffs, commit ranges, tags, and specific commits into cited reviewer briefs, release summaries, and changelog drafts.

Try: Ask: "What changed on main..HEAD for reviewers?"

Trace symptoms to code paths

Turn stack traces, webhook failures, and customer reports into cited explanations grounded in code, history, and external constraints.

Try: Ask: "Why would webhook retries fail?"

Reconcile code with external docs

Pinpoint cited docs, APIs, issues, and articles, then connect external evidence to local implementation context.

Try: Ask: "Do our OAuth refresh tokens match current guidance?"

THE APPROACH

Deep understanding needs more than search

ChunkHound parses your code with cAST — AST-aware chunking backed by Carnegie Mellon research (arXiv:2506.15655). It then grounds answers in the places software understanding actually lives: current code, architecture, git history, external docs, and generated knowledge.

Current code Semantic and regex search

Architecture Cited cross-file research

Git history Semantic search over commits, ranges, and branches

External context Pinpoint web research

Shared knowledge Autodoc from research

One codebase context layer

TOKEN EFFICIENCY

A retry is just a token you didn't have to spend.

When an agent doesn't understand your architecture, it guesses. Wrong guesses turn into retries, polluted context, and wasted tokens. ChunkHound shifts that work earlier: one research call can surface the auth chain, the existing utility, and the data model change before code gets written.

Research-first workflow

Research call — architecture mapped,

changes understood, patterns surfaced.

Attempt 1 — done.

“I would pay for ChunkHound. The code research is that good.” — @FlatTreNeb, Reddit

CAPABILITIES

Capabilities behind deep codebase understanding

Multi-Hop Semantic Search

Follows semantic bridges across files and subsystems. Three hops surface architectural relationships that plain text matching misses.

3-hop traversal

Gap Detection & Filling

Finds missing parts of the answer by clustering evidence, spotting uncovered concepts, and filtering noise before synthesis.

Elbow detection

cAST Chunking

Code-aware chunking from Carnegie Mellon research that improves retrieval quality before search and research even begin.

arXiv:2506.15655

Depth Exploration

Explores promising files from multiple angles, so one subsystem can be understood by behavior, responsibilities, and patterns.

Aspect-based queries

Code Research

One call produces a cited markdown explanation of how the system works, grounded in the files and components behind the answer.

Cited reports

Unified Semantic + Regex

Combines conceptual discovery with exact symbol and regex tracing, then reranks everything against the original question.

Hybrid search

Git History Research

Asks by last N commits, commit hash, tag, branch, or range to explain large PRs, release ranges, and why behavior changed.

--last-n · --commit-range · --commit-hash

Pinpoint Web Research

Finds the cited external docs, APIs, issues, and articles your implementation depends on, then brings them into agent context.

Cited URL research

Auto-Documentation

Turns research-backed understanding into a searchable documentation site generated directly from the codebase.

5 levels

“Chunkhound has been an absolute beast in reading multiple large repos in such extents that would take a human days or the plain claude code setup multiple attempts to get it (at least somewhat) right.” — @flrk, co-founder of Kimara AI

PROVEN AT SCALE

Your laptop outperforms their cluster.

ChunkHound indexes your entire codebase on your own machine — no Kubernetes, no GPU servers, no vendor lock-in — and can run fully local with local providers. Open source, MIT licensed, and proven on 50M+ line codebases in production today.

	Typical setup	ChunkHound
Kubernetes cluster	often required	not needed
H100 / GPU server	often required	not needed
Vendor subscription	often required	MIT licensed
Zero code egress	rare	with local providers
Closed, proprietary	yes	open source

Applied Materials

50M+ lines · 1 developer laptop · local-provider option · MIT licensed · production today

Connect to your AI agent

+ any MCP-compatible

+ any OpenAI-compatible

echo .chunkhound.json >> .gitignore
cat > .chunkhound.json <<'CHUNKHOUND_EOF'
{
  "embedding": {
    "provider": "voyageai",
    "model": "voyage-3.5",
    "api_key": "<YOUR_VOYAGE_API_KEY>"
  },
  "llm": {
    "provider": "anthropic",
    "api_key": "<YOUR_ANTHROPIC_API_KEY>"
  }
}
CHUNKHOUND_EOF
mkdir -p .cursor
cat > .cursor/mcp.json <<'CHUNKHOUND_EOF'
{
  "mcpServers": {
    "ChunkHound": {
      "command": "chunkhound",
      "args": [
        "mcp"
      ]
    }
  }
}
CHUNKHOUND_EOF
chunkhound index .

.chunkhound.json holds your API keys

The first command adds it to .gitignore so you don't commit secrets. Replace the <YOUR_*_API_KEY> placeholders with real keys before running. Local OpenAI-compatible backends still need an explicit model. Need Azure OpenAI, a self-hosted endpoint, or a proxy?


                        {"provider":"voyageai","model":"voyage-3.5","api_key":"<YOUR_VOYAGE_API_KEY>"}


                        {"provider":"openai","model":"text-embedding-3-small","api_key":"<YOUR_OPENAI_API_KEY>"}


                        {"provider":"openai","model":"qwen3-embedding","base_url":"http://localhost:11434/v1","rerank_model":"qwen3-reranker","rerank_format":"cohere"}


                        {"provider":"openai","model":"Qwen/Qwen3-Embedding-0.6B","base_url":"http://localhost:8000/v1","rerank_model":"Qwen/Qwen3-Reranker-0.6B","rerank_format":"cohere"}


                        {"provider":"anthropic","api_key":"<YOUR_ANTHROPIC_API_KEY>"}


                        {"provider":"openai","api_key":"<YOUR_OPENAI_API_KEY>"}


                        {"provider":"codex-cli"}


                        {"provider":"claude-code-cli"}


                        {"provider":"gemini","model":"gemini-3.5-flash","api_key":"<YOUR_GEMINI_API_KEY>"}


                        {"provider":"deepseek","model":"deepseek-v4-flash","api_key":"<YOUR_DEEPSEEK_API_KEY>"}


                        {"provider":"grok","model":"grok-4.3","api_key":"<YOUR_XAI_API_KEY>"}


                        {"provider":"openai","model":"qwen3-coder:30b","base_url":"http://localhost:11434/v1"}


                        {"provider":"openai","model":"Qwen/Qwen3-Coder-30B-A3B-Instruct","base_url":"http://localhost:8000/v1"}


                        {"provider":"opencode-cli"}

mkdir -p .cursor
cat > .cursor/mcp.json <<'CHUNKHOUND_EOF'
{
  "mcpServers": {
    "ChunkHound": {
      "command": "chunkhound",
      "args": [
        "mcp"
      ]
    }
  }
}
CHUNKHOUND_EOF

claude mcp add ChunkHound -- chunkhound mcp

mkdir -p .vscode
cat > .vscode/mcp.json <<'CHUNKHOUND_EOF'
{
  "servers": {
    "ChunkHound": {
      "type": "stdio",
      "command": "chunkhound",
      "args": [
        "mcp"
      ]
    }
  }
}
CHUNKHOUND_EOF

cat > opencode.json <<'CHUNKHOUND_EOF'
{
  "mcp": {
    "ChunkHound": {
      "type": "local",
      "command": [
        "chunkhound",
        "mcp"
      ]
    }
  }
}
CHUNKHOUND_EOF

codex mcp add ChunkHound -- chunkhound mcp

mkdir -p ~/.codeium/windsurf
cat > ~/.codeium/windsurf/mcp_config.json <<'CHUNKHOUND_EOF'
{
  "mcpServers": {
    "ChunkHound": {
      "command": "chunkhound",
      "args": [
        "mcp"
      ]
    }
  }
}
CHUNKHOUND_EOF

mkdir -p .roo
cat > .roo/mcp.json <<'CHUNKHOUND_EOF'
{
  "mcpServers": {
    "ChunkHound": {
      "command": "chunkhound",
      "args": [
        "mcp"
      ]
    }
  }
}
CHUNKHOUND_EOF

cat > settings.json <<'CHUNKHOUND_EOF'
{
  "context_servers": {
    "chunkhound": {
      "command": "chunkhound",
      "args": [
        "mcp"
      ]
    }
  }
}
CHUNKHOUND_EOF