Skip to content

Code Research

Architectural Understanding, Not Just Search Results

Section titled “Architectural Understanding, Not Just Search Results”

Ask: “How does authentication work?”

Don’t get a list of files containing “auth.” Get a comprehensive report mapping auth components, relationships, security patterns, and configuration—with file.ts:45 citations.

Code Research performs breadth-first exploration of your codebase’s semantic graph, following connections between components and synthesizing findings into structured markdown reports.


If you previously configured the “Code Expert Agent” in .claude/agents/code-expert.md:

  • No longer needed - Code Research is now a built-in MCP tool, not a separate agent
  • LLM configuration now required - Code Research needs LLM provider configuration for synthesis and analysis
  • Same functionality - Deep architectural research works the same way, just integrated directly into ChunkHound

1. Add LLM configuration to .chunkhound.json:

{
"llm": {
"provider": "claude-code-cli"
}
}

2. Remove the old agent file:

Terminal window
rm .claude/agents/code-expert.md

3. Restart your MCP server:

Terminal window
# Stop current server (Ctrl+C if running in terminal)
# Restart ChunkHound MCP server
chunkhound mcp

4. Verify it works:

Ask your AI assistant: “Research the authentication implementation”

You should see Code Research tool invocation instead of agent delegation.

No. The old agent file is ignored once removed. If you keep both, Claude Code may try to use the agent instead of the built-in tool. Remove the agent file to ensure clean migration.

Code Research features won’t work without LLM configuration. However:


  • Before implementing features - Find existing patterns to reuse instead of reinventing
  • During debugging - Map complete flows to find the actual failure point
  • Refactoring prep - Understand all dependencies before making changes
  • Code archaeology - Learn unfamiliar systems quickly

Code Research is designed for architectural exploration. For simpler queries, use the base search tools directly:

  • Quick symbol lookups - Use regex search to find all occurrences of a specific function or class name
  • Known file/function - Use semantic search when you know roughly what you’re looking for
  • Architectural questions - Use Code Research to understand how components interact and why

Via CLI:

Terminal window
chunkhound research "how does rate limiting work?"

Via MCP (with your AI assistant):

"Research our rate limiting implementation"

Code Research returns a structured markdown report with architectural insights and precise file citations. Here’s what a typical report looks like:

Example: Rate Limiting Research Output
## Rate Limiting Architecture
### Overview
The application implements token bucket rate limiting using Redis for distributed state.
Rate limiting is applied at the middleware layer with per-endpoint configuration.
### Core Components
**RateLimitMiddleware** (`src/middleware/ratelimit.ts:45-120`)
- Token bucket algorithm with sliding window
- Redis-based distributed counters
- Custom headers for limit status
- Applied to 12 API endpoints
**Configuration** (`config/limits.yaml:1-30`)
- Per-endpoint rate definitions
- Default: 100 requests per 15-minute window
- Environment-based overrides supported
### Usage Pattern
Found across these endpoints:
- `POST /api/auth/login` - 5 requests/min (src/routes/auth.ts:23)
- `POST /api/users/create` - 10 requests/min (src/routes/users.ts:45)
- `GET /api/data/*` - 100 requests/min (src/routes/data.ts:67)
### Implementation Recommendation
Reuse existing middleware for new endpoints:
```typescript
app.use('/api/new-endpoint', rateLimiter({
windowMs: 15 * 60 * 1000,
max: 100
}));
```
### Key Files
- `src/middleware/ratelimit.ts` - Core implementation
- `src/services/redis.ts:89-145` - Redis client
- `config/limits.yaml` - Configuration
- `tests/middleware/ratelimit.test.ts` - Test examples

Parameters:

  • query (required) - Your research question

The report includes:

  • Architectural overview and design patterns
  • Component locations with file.ts:line citations
  • Usage examples from your codebase
  • Implementation recommendations

Code Research requires an LLM provider for intelligent synthesis and query expansion. ChunkHound uses a dual-provider architecture:

  • Utility Provider - Fast operations: query expansion, follow-up generation
  • Synthesis Provider - Deep analysis: final synthesis with large context windows

Quick setup examples:

// Claude Code CLI (recommended for Claude Code users)
{
"llm": {
"provider": "claude-code-cli"
}
}
// Codex CLI (recommended for Codex users)
{
"llm": {
"provider": "codex-cli",
"codex_reasoning_effort": "medium"
}
}
// OpenAI (for users without CLI subscriptions)
{
"llm": {
"provider": "openai",
"api_key": "sk-your-key"
}
}

For complete setup instructions including environment variables, mixed providers, and all configuration options, see the LLM Configuration section of the Configuration guide.


Code Research is a specialized sub-agent system optimized for code understanding. Unlike simple semantic search that returns matching chunks, it performs breadth-first exploration of your codebase’s semantic graph, following connections and understanding architectural relationships.

The system combines:

  • Multi-hop semantic search: Starting from your query, it expands outward through semantic relationships, exploring connected components
  • Hybrid semantic + symbol search: Discovers conceptually relevant code, then finds all exact symbol references for comprehensive coverage
  • Intelligent synthesis: Generates structured markdown reports with architectural insights and precise file:line citations

Token budgets scale with repository size (30k-150k input tokens), and the system automatically allocates resources based on what it discovers.

For deep implementation details, see the Advanced: Technical Deep Dive section below or the Under the Hood documentation.


Starting from your query, the system expands outward through semantic relationships:

Query: "authentication error handling"
Level 0: Direct matches
→ auth_error_handler()
→ validate_credentials()
Level 1: Connected components (semantic neighbors)
→ error_logger() (shares error handling patterns)
→ token_validator() (shares auth validation logic)
Level 2: Architectural relationships
→ database_retry() (error logger uses it)
→ session_cleanup() (token validator calls it)

At each level, an LLM generates context-aware follow-up questions to explore promising directions, turning semantic search into guided exploration of architectural connections.

Traditional Graph RAG systems build explicit knowledge graphs—extracting entities, mining relationships, and storing them in graph databases. Code Research approximates graph-like exploration through orchestration, trading explicit relationship modeling for zero upfront cost and automatic scaling.

ChunkHound’s base layer (cAST index + semantic/regex search) provides traditional RAG capabilities. The Code Research sub-agent orchestrates these tools to create Graph RAG behavior:

Base Layer Foundation:

  • Chunks as nodes: cAST chunking preserves metadata (function names, class hierarchies, parameters, imports)
  • Vector similarity as edges: Semantic search finds conceptually related chunks via HNSW index
  • Symbol references as edges: Regex search finds all exact symbol occurrences

Orchestration Layer Creates the Graph:

  • BFS traversal: Iteratively calls semantic search, starting from initial results and expanding through related chunks
  • Query expansion: Generates multiple semantic entry points, exploring different “neighborhoods” in parallel
  • Symbol extraction + regex: Pulls symbols from semantic results, triggers parallel regex to find all references
  • Follow-up questions: Creates targeted queries based on discovered code, recursively exploring architectural boundaries
  • Convergence detection: Monitors score degradation to prevent infinite traversal

Because cAST chunks preserve semantic boundaries, multi-hop expansion follows meaningful architectural connections rather than arbitrary text proximity. This structural awareness is why orchestration can approximate graph traversal—the base chunks already encode relationships that orchestration discovers through iterative search.

The virtual graph emerges through orchestrated tool use, not pre-computed storage:

  • Initial semantic search → discovers conceptually relevant chunks
  • Multi-hop expansion → follows vector similarity “edges” through BFS
  • Symbol extraction → identifies key entities from high-relevance results
  • Regex search → finds all references, completing the “graph” of connections
  • Follow-ups → explores architectural relationships discovered in results

This approach scales efficiently to multi-million LOC repositories because there’s no explicit graph to maintain—the “graph” is the pattern of orchestrated search calls, adapted dynamically to each query’s needs.

After each semantic search finds conceptually relevant chunks, the system extracts symbols (function names, class names, parameter names) and runs parallel regex searches to find every occurrence of those symbols across the codebase.

This hybrid approach combines:

  • Semantic search: Discovers what’s conceptually relevant (understanding)
  • Regex search: Finds all exact symbol references (precision)

The results are unified through simple deduplication by chunk ID. Semantic results retain their reranked relevance scores from the multi-hop search phase, while regex results add new chunks containing exact symbol matches that weren’t discovered semantically. This gives you comprehensive coverage: the semantic “why this matters” plus the regex “everywhere this appears.” Since regex is a local database operation, this adds zero API costs while providing more complete results.

Traditional semantic search finds conceptually similar code but misses architectural relationships. Knowledge graphs model these relationships explicitly but require expensive upfront extraction and ongoing maintenance.

Code Research combines base search capabilities (semantic + regex) with intelligent orchestration:

  1. Query expansion - Multiple semantic entry points discover different code neighborhoods
  2. Multi-hop exploration - BFS through semantic neighborhoods following architectural connections
  3. Symbol extraction + regex - Comprehensive coverage beyond semantic discovery
  4. Follow-up generation - Context-aware questions explore architectural boundaries
  5. Adaptive scaling - Token budgets (30k-150k) scale with codebase size
  6. Map-reduce synthesis - Parallel cluster synthesis with deterministic citation remapping

The virtual graph emerges through orchestrated tool use—no upfront construction, no separate storage, no synchronization overhead. Query-adaptive orchestration scales from quick searches to deep architectural exploration automatically.

Token budgets scale with repository size (30k-150k input tokens) and traversal depth (shallow→deep). The system automatically allocates resources based on what it’s discovering.

Small result sets use single-pass synthesis (one LLM call). Large result sets trigger map-reduce synthesis (cluster chunks, synthesize clusters, combine summaries). Output is always a structured markdown report with architectural insights and file.ts:45 citations.

Code research doesn’t blindly pass all collected chunks to synthesis. After BFS exploration completes, the system performs a final reranking pass against your original query to filter for quality and relevance:

  1. File-level reranking: All discovered files are reranked using the reranker model against your original question
  2. Token budget allocation: Files are prioritized by relevance score, and only the highest-scoring files fit within the synthesis token budget
  3. Chunk filtering: Only chunks from budgeted files make it to the final synthesis

This implements a classic precision-recall tradeoff—cast a wide net during exploration (maximize recall), then filter for quality before synthesis (maximize precision). Low-relevance findings are excluded, ensuring the LLM synthesizes only the most pertinent architectural insights.

When semantic clustering produces multiple clusters from filtered results, the system uses two-phase HDBSCAN clustering with map-reduce synthesis to prevent context collapse:

  1. Phase 1 (Natural Boundary Discovery): HDBSCAN (Hierarchical Density-Based Spatial Clustering) discovers natural semantic boundaries in the embedding space, grouping files where they are cohesively related rather than forcing arbitrary partitions. This respects the inherent structure of your codebase, identifying both semantically dense clusters and outliers that don’t fit natural groupings.

  2. Phase 2 (Token-Budget Grouping): Clusters are greedily merged based on centroid distance while respecting the 30k token limit per cluster, preserving semantic coherence during merging.

Files are partitioned into token-bounded clusters, synthesized in parallel with cluster-local citations [1][2][3], then deterministically remapped to global numbers before the reduce phase combines summaries.

This avoids progressive compression loss from iterative summarization chains (summary → summary-of-summary). Each cluster synthesizes once with full context, preserving architectural details while enabling arbitrary scaling. Cluster-local citation namespaces enable maximum parallelism—no coordination needed during map phase. The reduce LLM integrates remapped summaries with explicit instructions to preserve citations (not generate new ones), ensuring every [N] traces to actual source files.

Result: 10KB repos use single-pass synthesis (k=1), 1M+ LOC repos automatically scale to map-reduce (k=5+) without context collapse or citation hallucination.