ChunkHound

Don't search your code. Research it.

Your AI assistant searches code but doesn’t understand it. ChunkHound researches your codebase—extracting architecture, patterns, and institutional knowledge at any scale.

The Reality

Your AI assistant helps you code but lacks critical context: Missing code context • Duplicate functions • Breaking patterns • Lost architecture • Conflicting specs • Scale overwhelm

ChunkHound gives AI the context it needs - deep understanding of your code, files, and architectural decisions before writing anything.

Institutional Knowledge, On Demand

Your best engineers understand more than where code lives—they know why it exists, how it connects, and when it matters. ChunkHound extracts that same depth of understanding from your entire codebase, dynamically, at any scale.

The Code Research tool explores your code like an experienced engineer would—following references, understanding patterns, synthesizing insights across 29 file types and millions of lines.

Why ChunkHound is Different

Two-Layer Architecture: Best of Both Worlds

ChunkHound provides both traditional RAG capabilities AND intelligent orchestration for deep exploration:

Base Layer: Enhanced RAG

Like traditional RAG systems, ChunkHound maintains an index and provides search tools—but with critical improvements:

cAST chunking: Structure-aware code segmentation (4.3 point gain on retrieval benchmarks)
Semantic search: Natural language queries via HNSW vector indexing
Regex search: Exact pattern matching for comprehensive symbol coverage

Orchestration Layer: Code Research Sub-Agent

The Code Research tool is a specialized orchestration layer that uses base search tools strategically:

Multi-hop exploration: BFS traversal discovering architectural relationships
Query expansion: Multiple semantic entry points to cast wider nets
Follow-up generation: Iterative questioning based on discovered code
Adaptive scaling: Token budgets automatically scale from 30k-150k based on repository size
Map-reduce synthesis: Handles millions of lines without context collapse

The result: Virtual Graph RAG behavior through orchestration, not explicit graph construction.

What this means for you:

Use what you need: Direct semantic/regex search for quick lookups, Code Research for architectural exploration
Zero upfront cost: No entity extraction, no graph database to maintain
Query-adaptive: Simple questions get fast answers, complex questions trigger deep exploration automatically
Scales to monorepos: Orchestration layer adapts exploration depth and synthesis budgets to codebase size

Compare approaches:

Approach	Base Capability	Orchestration	Monorepo Scale	Maintenance
Keyword Search	Exact matching	None	✓ Fast	None
Traditional RAG	Semantic search	None	✓ Scales	Re-index files
Knowledge Graphs	Relationship queries	Pre-computed	✗ Expensive	Continuous sync
ChunkHound	Semantic + Regex	Code Research sub-agent	✓ Automatic	Automatic (incremental + realtime)

Production Ready

Battle-tested at monorepo scale:

Millions of lines across multi-language codebases
29 languages and formats with AST-aware parsing (Python, TypeScript, Go, Rust, C++, Java, and more)
5 minutes from installation to first deep research query
Zero cloud dependencies - your code stays local, searches stay fast
Automatic scaling - token budgets and exploration depth adapt to repository size

Ideal for:

Large monorepos with cross-team dependencies and circular references
Multi-language projects requiring consistent search across all code
Security-sensitive codebases that can’t use cloud-based code search
Offline development environments or air-gapped systems

Built on proven foundations:
Tree-sitter for parsing • DuckDB for local vector search • MCP for AI integration

Stop recreating code. Start with deep understanding.

Latest Updates

Stay up to date with ChunkHound's latest features and improvements.

Scalable Code Analysis

🔍

Map-reduce synthesis breaks complex queries into parallel subtasks, preventing context collapse on multi-million LOC codebases.

Numbered citations [1][2][3] replace verbose file.py:123 references
New chunkhound research CLI command for direct code analysis
Automatic query expansion with deduplication casts wider semantic nets

Indexing Performance

⚡

10-100x faster indexing via native git bindings, parallel directory discovery, and ProcessPoolExecutor for CPU-bound parsing.

RapidYAML parser handles large k8s manifests 10-100x faster than tree-sitter
7 new AST-aware parsers: Swift, Objective-C, Zig, Haskell, HCL, Vue, PHP (29+ total)
Provider-aware embedding batching optimizes API throughput (OpenAI: 8, VoyageAI: 40)

Production Tooling

🛠️

New CLI commands and integrations for production workflows and debugging.

simulate (dry-run), diagnose (compare ChunkHound vs git rules), calibrate (auto-tune batch sizes)
TEI reranker format support - two-stage retrieval with cross-encoder, no vendor lock-in
Repo-aware gitignore engine prevents rule leakage between sibling repos

View Full Changelog →