Skip to content

How-To Guides

Once you’ve got ChunkHound up and running, it’s time to dive deeper. This guide covers advanced indexing strategies, server deployment, and optimization for large codebases.

For large codebases, indexing is a separate step that provides significant benefits:

Performance

Index once, search many times Initial indexing takes time, but subsequent searches are instant

Smart Diffing

Only processes changed files Preserves embeddings for unchanged code

Fix Command

Repairs inconsistencies chunkhound index detects and fixes database drift

Enterprise Ready

Battle-tested scaling Used on codebases with millions of LOC

Terminal window
$ chunkhound index /path/to/large-codebase
Scanning 10,000 files...
Processing 8,234 Python files, 1,766 TypeScript files...
45,000 chunks indexed
Embeddings: 45,000 generated
⏱️ Time: 7m 30s
Terminal window
$ chunkhound index # After editing 3 files
Detecting changes...
3 files modified, 9,997 files unchanged
150 chunks updated
Embeddings: 150 generated, 44,850 reused
⏱️ Time: 18 seconds

When using MCP servers, ChunkHound automatically watches your files and updates the index as you edit. No manual commands needed:

  • Edit files → Index updates automatically
  • Switch git branches → Only changed files get re-indexed
  • Add new files → Automatically detected and indexed
  • Delete files → Automatically removed from index

This makes ChunkHound perfect for live memory systems - create a folder of markdown notes that stays searchable as you add and modify content.

Use CaseModeCommand
Personal developmentstdiochunkhound mcp
Team/production useHTTPchunkhound mcp --http

Your IDE starts/stops the server automatically. The index stays in memory for instant searches. Perfect for personal development with a single IDE.

Terminal window
chunkhound mcp /path/to/project

You start the server once, multiple IDEs can connect. Ideal for teams or when switching between multiple git worktrees.

Terminal window
chunkhound mcp /path/to/project --http --port 8000
# Connect IDEs to http://localhost:8000

ChunkHound is production-ready and actively tested on codebases with 200,000+ lines of code. Here’s how to deploy it effectively:

Scale & Performance

4.8M lines in 56 minutes

Proven performance on massive codebases with minimal CPU overhead

Project Diversity

Battle-tested across architectures

From GoatDB’s TypeScript monolith to Kubernetes’ multi-language ecosystem

Multi-Language Support

22+ Languages

Python, TypeScript, Go, Rust, Java, C++, and more via Tree-sitter

AI-Built Architecture

100% AI-Generated

Entire codebase written by AI agents, using cAST algorithm for intelligent code chunking

We indexed the entire Kubernetes codebase - 4.8 million lines across 23,000 files. Here’s what happened on a MacBook Pro M4:

56 minutes. That’s it.

But here’s the surprising part: ChunkHound barely broke a sweat. CPU usage stayed at 47% the entire time. Your laptop wasn’t struggling - it was waiting. Waiting for the embedding API to process each chunk and send it back.

This tells you something important about scaling ChunkHound: the bottleneck isn’t your hardware, it’s your embedding provider.

Want it faster?

  • Local models eliminate the network round-trip entirely
  • On-premise servers give you control over throughput
  • Smaller indexes for daily work - index just the modules you’re actively developing

The Kubernetes test proves ChunkHound can handle anything you throw at it. But most days, you won’t need to index millions of lines. Start with what you’re working on, expand as needed.