repowise init: One Command to Index Your Entire Codebase

repowise team··9 min read
repowise tutorialrepowise getting startedrepowise initindex codebaserepowise setup

Every software engineer has experienced the "Day Zero" friction of a new project: cloning a massive repository, staring at a directory structure with 2,000 files, and trying to figure out where the entry point is. Traditional documentation is usually stale, and grep only takes you so far when you don't know what keywords to look for.

In this repowise tutorial, we’ll explore how to transform a raw directory of source code into a fully indexed, AI-ready intelligence hub using a single command: repowise init. Whether you are looking for a repowise getting started guide or a deep dive into codebase indexing, this post covers the mechanics of how repowise builds a comprehensive map of your software.

From Zero to Codebase Intelligence in One Command

Codebase intelligence is more than just full-text search. It is the synthesis of three distinct data layers: the structural (Abstract Syntax Trees), the historical (Git metadata), and the semantic (LLM-generated insights). Manually assembling these layers for a large project is a Herculean task that most teams simply ignore, leading to "tribal knowledge" silos.

The repowise init command was designed to automate this assembly. It is the entry point for the platform, taking a local path and producing a standardized .repowise directory that serves as the "brain" for both human developers and AI agents. By the end of the initialization process, you have a local web UI for exploration and a Model Context Protocol (MCP) server ready to feed high-fidelity context to tools like Claude Code or Cursor.

What repowise init Actually Does

When you run repowise init, the engine kicks off a multi-stage pipeline. It’s useful to understand these stages to appreciate how the platform maintains high "freshness" scores and accurate dependency maps.

Step 1: Repository Discovery

The process begins by scanning the target directory. Repowise respects your .gitignore and .dockerignore files by default, ensuring that node_modules, build artifacts, and binaries don't clutter the index. It establishes a root manifest and begins the process of identifying the "boundaries" of your modules.

Step 2: Language Detection and AST Parsing

Repowise supports over 10 languages, including Python, TypeScript, Go, and Rust. For every file discovered, it uses specialized parsers to build an Abstract Syntax Tree (AST). This allows the system to identify symbols (classes, functions, variables) rather than just raw text. To see how this looks in practice, you can check out the auto-generated docs for FastAPI which were built using this exact parsing logic.

Step 3: Git History Mining

Code is a living document. Repowise mines the .git folder to understand who owns which parts of the code and how often files change. By correlating commit frequency (churn) with code complexity, it identifies "hotspots"—areas of the code that are likely to contain bugs or technical debt. You can view the ownership map for Starlette to see how this git intelligence is visualized.

Step 4: Dependency Graph Construction

By parsing import statements across your entire project, repowise builds a directed graph of your codebase. It doesn't just show that File A imports File B; it calculates PageRank scores to find the most "important" files and detects community clusters. This is critical for understanding the impact of a refactor.

The repowise init PipelineThe repowise init Pipeline

Step 5: LLM-Powered Documentation Generation

This is where the "intelligence" truly comes in. Repowise sends the extracted symbols and structural context to an LLM (OpenAI, Anthropic, or local Ollama) to generate high-level summaries. Unlike manual READMEs, these are generated per-module and per-symbol, and they include a "freshness" score that degrades when the underlying code changes.

Step 6: Vector Index Creation

To support semantic search ("Where is the logic for user authentication?"), repowise creates a vector index using LanceDB or pgvector. It embeds the documentation, code snippets, and summaries, allowing for "fuzzy" conceptual lookups that go far beyond what rg or grep can provide.

Step 7: MCP Server Configuration

Finally, the initialization process generates the configuration for the Model Context Protocol (MCP) server. This exposes 8 structured tools—like get_risk() and get_architecture_diagram()—to AI agents. This bridge allows your AI assistant to "query" your codebase's structure rather than just reading raw files.

Installation

Getting started with repowise setup is straightforward. The platform is distributed via PyPI and is designed to run in your local development environment.

pip install repowise

First, ensure you have Python 3.9+ installed. Then, run:

pip install repowise

Choosing Your LLM Provider

Repowise is provider-agnostic. You can use hosted models for speed and quality, or local models for maximum privacy.

ProviderBest ForSetup Requirement
AnthropicReasoning & MCPANTHROPIC_API_KEY
OpenAISpeed & CostOPENAI_API_KEY
Google GeminiLarge Context WindowsGOOGLE_API_KEY
Ollama100% Local / PrivateLocal Ollama Instance

Setting Up API Keys

Export your preferred key to your environment:

export ANTHROPIC_API_KEY="your-key-here"

Running Your First Index

With the CLI installed and your API key set, you are ready for the index codebase step. Navigate to the root of the project you want to analyze.

Basic Usage

Run the following command to begin the indexing process:

repowise init .

The CLI will prompt you to select your LLM provider and choose which languages to index. For a standard repository, the process takes anywhere from 30 seconds to a few minutes, depending on the number of files and the speed of your LLM provider.

Output Walkthrough

As the command runs, you will see a progress bar for each stage of the pipeline:

  1. Scanning: Building the file list.
  2. Parsing: Extracting symbols from 10+ languages.
  3. Mining: Analyzing git history for hotspots and ownership.
  4. Summarizing: Generating LLM-powered documentation.
  5. Indexing: Creating the vector database.

What Gets Created

Once finished, you will notice a new .repowise directory in your project root. This directory contains:

  • index.db: The SQLite database containing structural and git metadata.
  • vectors/: The LanceDB vector store for semantic search.
  • config.yaml: Your project-specific settings.
  • CLAUDE.md: A static, LLM-friendly summary of the codebase.

Repowise Index ArtifactsRepowise Index Artifacts

Exploring the Results

After running repowise init, you have several ways to interact with your newly created codebase intelligence.

repowise serve (Web UI)

To visualize the data, run:

repowise serve

This launches a local web interface. Here, you can explore the dependency graph, view hotspot analysis (which files are most "dangerous" to change), and browse the auto-generated wiki. To see what the final output looks like before running it yourself, you can explore our live examples of popular open-source repos.

MCP Server (AI Agent Access)

This is arguably the most powerful feature. By indexing your codebase, you've essentially given your AI agent a "map." You can connect the repowise MCP server to Claude Desktop or Cursor. When you ask the agent a question, it can now call tools like:

  • get_overview(): To understand the architecture.
  • get_risk(): To see if a proposed change is in a high-churn area.
  • get_dead_code(): To find unused exports or zombie packages.

To understand the full capability of these tools, read about repowise's architecture and how the MCP server acts as a bridge between your code and the LLM.

CLAUDE.md (Static Context)

Repowise also generates a CLAUDE.md file. This is a condensed version of your codebase intelligence designed to be read by LLMs in a single pass. It includes the tech stack, entry points, and coding standards, providing an instant "personality" for your project when using chat-based AI tools.

Customization Options

While the default settings work for most projects, you can fine-tune the repowise setup via the repowise.yaml file created during init.

Provider Selection

You can mix and match providers. For example, you might use OpenAI for fast vector embeddings but Anthropic for high-quality architectural summaries.

Excluding Paths

If you have a large docs/ folder or legacy vendor/ directories that don't need indexing, add them to the exclude list in your config:

exclude:
  - "**/legacy/**"
  - "**/tests/fixtures/**"

Incremental vs Full Index

By default, repowise init is smart. If you run it again, it only processes files that have changed since the last index (based on git hashes). This makes maintaining the index extremely cheap in terms of API costs and time.

AI Agent Context BridgeAI Agent Context Bridge

Troubleshooting Common Issues

1. "Rate limit exceeded" during indexing If you are indexing a very large codebase (e.g., >5,000 files), you may hit LLM API rate limits. Solution: Use the --concurrency flag to slow down requests, or switch to a local Ollama instance for the summarization step.

2. Missing dependencies in the graph Repowise uses static analysis. If your project uses dynamic imports or highly complex build-time aliases, some connections might be missed. Solution: Check the repowise.yaml to ensure your tsconfig.json or sys.path equivalents are correctly mapped.

3. Memory usage on large repos Building a full dependency graph for a massive monorepo can be memory-intensive. Solution: Use the --exclude flag to index one package at a time, or increase the available memory for the Python process.

Key Takeaways

The repowise init command is the foundation of a modern development workflow. By automating the extraction of structural, historical, and semantic data, it bridges the gap between raw source code and actionable intelligence.

  • One Command: repowise init handles everything from AST parsing to vector indexing.
  • Multi-Layered: It combines Git history, dependency graphs, and LLM summaries.
  • Agent-Ready: It automatically configures an MCP server with 8 specialized tools for AI agents.
  • Privacy-First: Supports local LLMs via Ollama, keeping your proprietary code on your machine.
  • Low Maintenance: Incremental indexing ensures your docs and graphs stay fresh as your code evolves.

Ready to see it in action? Head over to the FastAPI dependency graph demo to see the kind of insights you can generate for your own projects in just a few minutes.

Try repowise on your repo

One command indexes your codebase.