Code Research

Architectural Understanding, Not Just Search Results

Ask: “How does authentication work?”

Don’t get a list of files containing “auth.” Get a comprehensive report mapping auth components, relationships, security patterns, and configuration—with file.ts:45 citations.

Code Research performs breadth-first exploration of your codebase’s semantic graph, following connections between components and synthesizing findings into structured markdown reports.

Upgrading from v3 (Code Expert Agent)

If you previously configured the “Code Expert Agent” in .claude/agents/code-expert.md:

What Changed

No longer needed - Code Research is now a built-in MCP tool, not a separate agent
LLM configuration now required - Code Research needs LLM provider configuration for synthesis and analysis
Same functionality - Deep architectural research works the same way, just integrated directly into ChunkHound

Migration Steps

1. Add LLM configuration to .chunkhound.json:

{
  "llm": {
    "provider": "claude-code-cli"
  }
}

2. Remove the old agent file:

rm .claude/agents/code-expert.md

3. Restart your MCP server:

# Stop current server (Ctrl+C if running in terminal)
# Restart ChunkHound MCP server
chunkhound mcp

4. Verify it works:

Ask your AI assistant: “Research the authentication implementation”

You should see Code Research tool invocation instead of agent delegation.

Will Both Run Simultaneously?

No. The old agent file is ignored once removed. If you keep both, Claude Code may try to use the agent instead of the built-in tool. Remove the agent file to ensure clean migration.

What Breaks Without Migration?

Code Research features won’t work without LLM configuration. However:

When to Use It

Before implementing features - Find existing patterns to reuse instead of reinventing
During debugging - Map complete flows to find the actual failure point
Refactoring prep - Understand all dependencies before making changes
Code archaeology - Learn unfamiliar systems quickly

When to Use Direct Search Instead

Code Research is designed for architectural exploration. For simpler queries, use the base search tools directly:

Quick symbol lookups - Use regex search to find all occurrences of a specific function or class name
Known file/function - Use semantic search when you know roughly what you’re looking for
Architectural questions - Use Code Research to understand how components interact and why

Usage

Via CLI:

chunkhound research "how does rate limiting work?"

Via MCP (with your AI assistant):

"Research our rate limiting implementation"

What You Get

Code Research returns a structured markdown report with architectural insights and precise file citations. Here’s what a typical report looks like:

Example: Rate Limiting Research Output

## Rate Limiting Architecture

### Overview
The application implements token bucket rate limiting using Redis for distributed state.
Rate limiting is applied at the middleware layer with per-endpoint configuration.

### Core Components

**RateLimitMiddleware** (`src/middleware/ratelimit.ts:45-120`)
- Token bucket algorithm with sliding window
- Redis-based distributed counters
- Custom headers for limit status
- Applied to 12 API endpoints

**Configuration** (`config/limits.yaml:1-30`)
- Per-endpoint rate definitions
- Default: 100 requests per 15-minute window
- Environment-based overrides supported

### Usage Pattern

Found across these endpoints:
- `POST /api/auth/login` - 5 requests/min (src/routes/auth.ts:23)
- `POST /api/users/create` - 10 requests/min (src/routes/users.ts:45)
- `GET /api/data/*` - 100 requests/min (src/routes/data.ts:67)

### Implementation Recommendation

Reuse existing middleware for new endpoints:

```typescript
app.use('/api/new-endpoint', rateLimiter({
  windowMs: 15 * 60 * 1000,
  max: 100
}));
```

### Key Files
- `src/middleware/ratelimit.ts` - Core implementation
- `src/services/redis.ts:89-145` - Redis client
- `config/limits.yaml` - Configuration
- `tests/middleware/ratelimit.test.ts` - Test examples

Parameters:

query (required) - Your research question

The report includes:

Architectural overview and design patterns
Component locations with file.ts:line citations
Usage examples from your codebase
Implementation recommendations

Setup & Configuration

Code Research requires an LLM provider for intelligent synthesis and query expansion. ChunkHound uses a dual-provider architecture:

Utility Provider - Fast operations: query expansion, follow-up generation
Synthesis Provider - Deep analysis: final synthesis with large context windows

Quick setup examples:

// Claude Code CLI (recommended for Claude Code users)
{
  "llm": {
    "provider": "claude-code-cli"
  }
}

// Codex CLI (recommended for Codex users)
{
  "llm": {
    "provider": "codex-cli",
    "codex_reasoning_effort": "medium"
  }
}

// OpenAI (for users without CLI subscriptions)
{
  "llm": {
    "provider": "openai",
    "api_key": "sk-your-key"
  }
}

For complete setup instructions including environment variables, mixed providers, and all configuration options, see the LLM Configuration section of the Configuration guide.

How It Works

Code Research is a specialized sub-agent system optimized for code understanding. Unlike simple semantic search that returns matching chunks, it performs breadth-first exploration of your codebase’s semantic graph, following connections and understanding architectural relationships.

The system combines:

Multi-hop semantic search: Starting from your query, it expands outward through semantic relationships, exploring connected components
Hybrid semantic + symbol search: Discovers conceptually relevant code, then finds all exact symbol references for comprehensive coverage
Intelligent synthesis: Generates structured markdown reports with architectural insights and precise file:line citations

Token budgets scale with repository size (30k-150k input tokens), and the system automatically allocates resources based on what it discovers.

For deep implementation details, see the Advanced: Technical Deep Dive section below or the Under the Hood documentation.

Advanced: Technical Deep Dive

Multi-Hop BFS Traversal

Starting from your query, the system expands outward through semantic relationships:

Query: "authentication error handling"

Level 0: Direct matches
  → auth_error_handler()
  → validate_credentials()

Level 1: Connected components (semantic neighbors)
  → error_logger() (shares error handling patterns)
  → token_validator() (shares auth validation logic)

Level 2: Architectural relationships
  → database_retry() (error logger uses it)
  → session_cleanup() (token validator calls it)

At each level, an LLM generates context-aware follow-up questions to explore promising directions, turning semantic search into guided exploration of architectural connections.

Graph RAG Without the Graph

Traditional Graph RAG systems build explicit knowledge graphs—extracting entities, mining relationships, and storing them in graph databases. Code Research approximates graph-like exploration through orchestration, trading explicit relationship modeling for zero upfront cost and automatic scaling.

How Orchestration Creates a Virtual Graph

ChunkHound’s base layer (cAST index + semantic/regex search) provides traditional RAG capabilities. The Code Research sub-agent orchestrates these tools to create Graph RAG behavior:

Base Layer Foundation:

Chunks as nodes: cAST chunking preserves metadata (function names, class hierarchies, parameters, imports)
Vector similarity as edges: Semantic search finds conceptually related chunks via HNSW index
Symbol references as edges: Regex search finds all exact symbol occurrences

Orchestration Layer Creates the Graph:

BFS traversal: Iteratively calls semantic search, starting from initial results and expanding through related chunks
Query expansion: Generates multiple semantic entry points, exploring different “neighborhoods” in parallel
Symbol extraction + regex: Pulls symbols from semantic results, triggers parallel regex to find all references
Follow-up questions: Creates targeted queries based on discovered code, recursively exploring architectural boundaries
Convergence detection: Monitors score degradation to prevent infinite traversal

Because cAST chunks preserve semantic boundaries, multi-hop expansion follows meaningful architectural connections rather than arbitrary text proximity. This structural awareness is why orchestration can approximate graph traversal—the base chunks already encode relationships that orchestration discovers through iterative search.

The virtual graph emerges through orchestrated tool use, not pre-computed storage:

Initial semantic search → discovers conceptually relevant chunks
Multi-hop expansion → follows vector similarity “edges” through BFS
Symbol extraction → identifies key entities from high-relevance results
Regex search → finds all references, completing the “graph” of connections
Follow-ups → explores architectural relationships discovered in results

This approach scales efficiently to multi-million LOC repositories because there’s no explicit graph to maintain—the “graph” is the pattern of orchestrated search calls, adapted dynamically to each query’s needs.

Hybrid Semantic + Symbol Search

After each semantic search finds conceptually relevant chunks, the system extracts symbols (function names, class names, parameter names) and runs parallel regex searches to find every occurrence of those symbols across the codebase.

This hybrid approach combines:

Semantic search: Discovers what’s conceptually relevant (understanding)
Regex search: Finds all exact symbol references (precision)

The results are unified through simple deduplication by chunk ID. Semantic results retain their reranked relevance scores from the multi-hop search phase, while regex results add new chunks containing exact symbol matches that weren’t discovered semantically. This gives you comprehensive coverage: the semantic “why this matters” plus the regex “everywhere this appears.” Since regex is a local database operation, this adds zero API costs while providing more complete results.

Why This Works

Traditional semantic search finds conceptually similar code but misses architectural relationships. Knowledge graphs model these relationships explicitly but require expensive upfront extraction and ongoing maintenance.

Code Research combines base search capabilities (semantic + regex) with intelligent orchestration:

Query expansion - Multiple semantic entry points discover different code neighborhoods
Multi-hop exploration - BFS through semantic neighborhoods following architectural connections
Symbol extraction + regex - Comprehensive coverage beyond semantic discovery
Follow-up generation - Context-aware questions explore architectural boundaries
Adaptive scaling - Token budgets (30k-150k) scale with codebase size
Map-reduce synthesis - Parallel cluster synthesis with deterministic citation remapping

The virtual graph emerges through orchestrated tool use—no upfront construction, no separate storage, no synchronization overhead. Query-adaptive orchestration scales from quick searches to deep architectural exploration automatically.

Adaptive Scaling

Token budgets scale with repository size (30k-150k input tokens) and traversal depth (shallow→deep). The system automatically allocates resources based on what it’s discovering.

Intelligent Synthesis

Small result sets use single-pass synthesis (one LLM call). Large result sets trigger map-reduce synthesis (cluster chunks, synthesize clusters, combine summaries). Output is always a structured markdown report with architectural insights and file.ts:45 citations.

Quality Filtering Before Synthesis

Code research doesn’t blindly pass all collected chunks to synthesis. After BFS exploration completes, the system performs a final reranking pass against your original query to filter for quality and relevance:

File-level reranking: All discovered files are reranked using the reranker model against your original question
Token budget allocation: Files are prioritized by relevance score, and only the highest-scoring files fit within the synthesis token budget
Chunk filtering: Only chunks from budgeted files make it to the final synthesis

This implements a classic precision-recall tradeoff—cast a wide net during exploration (maximize recall), then filter for quality before synthesis (maximize precision). Low-relevance findings are excluded, ensuring the LLM synthesizes only the most pertinent architectural insights.

Map-Reduce Synthesis with Clustering

When semantic clustering produces multiple clusters from filtered results, the system uses two-phase HDBSCAN clustering with map-reduce synthesis to prevent context collapse:

Phase 1 (Natural Boundary Discovery): HDBSCAN (Hierarchical Density-Based Spatial Clustering) discovers natural semantic boundaries in the embedding space, grouping files where they are cohesively related rather than forcing arbitrary partitions. This respects the inherent structure of your codebase, identifying both semantically dense clusters and outliers that don’t fit natural groupings.
Phase 2 (Token-Budget Grouping): Clusters are greedily merged based on centroid distance while respecting the 30k token limit per cluster, preserving semantic coherence during merging.

Files are partitioned into token-bounded clusters, synthesized in parallel with cluster-local citations [1][2][3], then deterministically remapped to global numbers before the reduce phase combines summaries.

This avoids progressive compression loss from iterative summarization chains (summary → summary-of-summary). Each cluster synthesizes once with full context, preserving architectural details while enabling arbitrary scaling. Cluster-local citation namespaces enable maximum parallelism—no coordination needed during map phase. The reduce LLM integrates remapped summaries with explicit instructions to preserve citations (not generate new ones), ensuring every [N] traces to actual source files.

Result: 10KB repos use single-pass synthesis (k=1), 1M+ LOC repos automatically scale to map-reduce (k=5+) without context collapse or citation hallucination.