How-To Guides

Once you’ve got ChunkHound up and running, it’s time to dive deeper. This guide covers advanced indexing strategies, server deployment, and optimization for large codebases.

The Index Command - Your Knowledge Base

Why Index Separately?

For large codebases, indexing is a separate step that provides significant benefits:

Performance

Index once, search many times Initial indexing takes time, but subsequent searches are instant

Smart Diffing

Only processes changed files Preserves embeddings for unchanged code

Fix Command

Repairs inconsistencies chunkhound index detects and fixes database drift

Enterprise Ready

Battle-tested scaling Used on codebases with millions of LOC

Example: Initial Index

$ chunkhound index /path/to/large-codebase
Scanning 10,000 files...
Processing 8,234 Python files, 1,766 TypeScript files...
✓ 45,000 chunks indexed
✓ Embeddings: 45,000 generated
⏱️  Time: 7m 30s

Example: Incremental Update

$ chunkhound index  # After editing 3 files
Detecting changes...
✓ 3 files modified, 9,997 files unchanged
✓ 150 chunks updated
✓ Embeddings: 150 generated, 44,850 reused
⏱️  Time: 18 seconds

Real-Time Updates

When using MCP servers, ChunkHound automatically watches your files and updates the index as you edit. No manual commands needed:

Edit files → Index updates automatically
Switch git branches → Only changed files get re-indexed
Add new files → Automatically detected and indexed
Delete files → Automatically removed from index

This makes ChunkHound perfect for live memory systems - create a folder of markdown notes that stays searchable as you add and modify content.

Choosing Your Server Mode

Use Case	Mode	Command
Personal development	stdio	`chunkhound mcp`
Team/production use	HTTP	`chunkhound mcp --http`

stdio Mode - Let Your IDE Handle It

Your IDE starts/stops the server automatically. The index stays in memory for instant searches. Perfect for personal development with a single IDE.

chunkhound mcp /path/to/project

HTTP Mode - Shared Server

You start the server once, multiple IDEs can connect. Ideal for teams or when switching between multiple git worktrees.

chunkhound mcp /path/to/project --http --port 8000
# Connect IDEs to http://localhost:8000

Production Deployment

ChunkHound is production-ready and actively tested on codebases with 200,000+ lines of code. Here’s how to deploy it effectively:

Scale & Performance

4.8M lines in 56 minutes

Proven performance on massive codebases with minimal CPU overhead

Project Diversity

Battle-tested across architectures

From GoatDB’s TypeScript monolith to Kubernetes’ multi-language ecosystem

Multi-Language Support

22+ Languages

Python, TypeScript, Go, Rust, Java, C++, and more via Tree-sitter

AI-Built Architecture

100% AI-Generated

Entire codebase written by AI agents, using cAST algorithm for intelligent code chunking

Real-World Performance

We indexed the entire Kubernetes codebase - 4.8 million lines across 23,000 files. Here’s what happened on a MacBook Pro M4:

56 minutes. That’s it.

But here’s the surprising part: ChunkHound barely broke a sweat. CPU usage stayed at 47% the entire time. Your laptop wasn’t struggling - it was waiting. Waiting for the embedding API to process each chunk and send it back.

This tells you something important about scaling ChunkHound: the bottleneck isn’t your hardware, it’s your embedding provider.

Want it faster?

Local models eliminate the network round-trip entirely
On-premise servers give you control over throughput
Smaller indexes for daily work - index just the modules you’re actively developing

The Kubernetes test proves ChunkHound can handle anything you throw at it. But most days, you won’t need to index millions of lines. Start with what you’re working on, expand as needed.