Skip to content
ChunkHound Logo ChunkHound Logo

Don't search your code. Research it.

Your AI assistant searches code but doesn’t understand it. ChunkHound researches your codebase—extracting architecture, patterns, and institutional knowledge at any scale.

Your AI assistant helps you code but lacks critical context: Missing code contextDuplicate functionsBreaking patternsLost architectureConflicting specsScale overwhelm

ChunkHound gives AI the context it needs - deep understanding of your code, files, and architectural decisions before writing anything.

Your best engineers understand more than where code lives—they know why it exists, how it connects, and when it matters. ChunkHound extracts that same depth of understanding from your entire codebase, dynamically, at any scale.

The Code Research tool explores your code like an experienced engineer would—following references, understanding patterns, synthesizing insights across 29 file types and millions of lines.

Two-Layer Architecture: Best of Both Worlds

ChunkHound provides both traditional RAG capabilities AND intelligent orchestration for deep exploration:

Like traditional RAG systems, ChunkHound maintains an index and provides search tools—but with critical improvements:

  • cAST chunking: Structure-aware code segmentation (4.3 point gain on retrieval benchmarks)
  • Semantic search: Natural language queries via HNSW vector indexing
  • Regex search: Exact pattern matching for comprehensive symbol coverage

Orchestration Layer: Code Research Sub-Agent

Section titled “Orchestration Layer: Code Research Sub-Agent”

The Code Research tool is a specialized orchestration layer that uses base search tools strategically:

  • Multi-hop exploration: BFS traversal discovering architectural relationships
  • Query expansion: Multiple semantic entry points to cast wider nets
  • Follow-up generation: Iterative questioning based on discovered code
  • Adaptive scaling: Token budgets automatically scale from 30k-150k based on repository size
  • Map-reduce synthesis: Handles millions of lines without context collapse

The result: Virtual Graph RAG behavior through orchestration, not explicit graph construction.

What this means for you:

  • Use what you need: Direct semantic/regex search for quick lookups, Code Research for architectural exploration
  • Zero upfront cost: No entity extraction, no graph database to maintain
  • Query-adaptive: Simple questions get fast answers, complex questions trigger deep exploration automatically
  • Scales to monorepos: Orchestration layer adapts exploration depth and synthesis budgets to codebase size

Compare approaches:

ApproachBase CapabilityOrchestrationMonorepo ScaleMaintenance
Keyword SearchExact matchingNone✓ FastNone
Traditional RAGSemantic searchNone✓ ScalesRe-index files
Knowledge GraphsRelationship queriesPre-computed✗ ExpensiveContinuous sync
ChunkHoundSemantic + RegexCode Research sub-agent✓ AutomaticAutomatic (incremental + realtime)

Battle-tested at monorepo scale:

  • Millions of lines across multi-language codebases
  • 29 languages and formats with AST-aware parsing (Python, TypeScript, Go, Rust, C++, Java, and more)
  • 5 minutes from installation to first deep research query
  • Zero cloud dependencies - your code stays local, searches stay fast
  • Automatic scaling - token budgets and exploration depth adapt to repository size

Ideal for:

  • Large monorepos with cross-team dependencies and circular references
  • Multi-language projects requiring consistent search across all code
  • Security-sensitive codebases that can’t use cloud-based code search
  • Offline development environments or air-gapped systems

Built on proven foundations:
Tree-sitter for parsing • DuckDB for local vector search • MCP for AI integration

Stop recreating code. Start with deep understanding.

Latest Updates

Stay up to date with ChunkHound's latest features and improvements.

Scalable Code Analysis

🔍
Map-reduce synthesis breaks complex queries into parallel subtasks, preventing context collapse on multi-million LOC codebases.
  • Numbered citations [1][2][3] replace verbose file.py:123 references
  • New chunkhound research CLI command for direct code analysis
  • Automatic query expansion with deduplication casts wider semantic nets

Indexing Performance

10-100x faster indexing via native git bindings, parallel directory discovery, and ProcessPoolExecutor for CPU-bound parsing.
  • RapidYAML parser handles large k8s manifests 10-100x faster than tree-sitter
  • 7 new AST-aware parsers: Swift, Objective-C, Zig, Haskell, HCL, Vue, PHP (29+ total)
  • Provider-aware embedding batching optimizes API throughput (OpenAI: 8, VoyageAI: 40)

Production Tooling

🛠️
New CLI commands and integrations for production workflows and debugging.
  • simulate (dry-run), diagnose (compare ChunkHound vs git rules), calibrate (auto-tune batch sizes)
  • TEI reranker format support - two-stage retrieval with cross-encoder, no vendor lock-in
  • Repo-aware gitignore engine prevents rule leakage between sibling repos