Git Hotspot Analysis: Find the Riskiest Files in Your Codebase
Every codebase has a "haunted graveyard"—a module so fragile and convoluted that developers avoid touching it at all costs. These files are often the source of your most persistent bugs, the highest cognitive load during onboarding, and the biggest bottlenecks in your CI/CD pipeline. Identifying these areas shouldn't rely on tribal knowledge or gut feelings. Instead, sophisticated engineering teams use git hotspot analysis to pinpoint exactly where their technical debt is most dangerous.
By combining historical git data with static code analysis, you can identify high risk code files before they cause a production incident. In this guide, we’ll explore the mechanics of code hotspots, the research that proves their impact, and how you can use tools like repowise to automate this intelligence within your workflow.
What Are Code Hotspots?
A code hotspot is a file or module that exhibits both high complexity and high change frequency (churn). While a complex file that never changes is "stable debt," and a simple file that changes daily is just "active development," the intersection of the two creates a high-probability zone for defects.
The Churn x Complexity Formula
The industry standard for identifying hotspots is a simple but powerful heuristic:
Risk Score = (Code Churn) × (Code Complexity)
- Code Churn: This measures how many times a file has been modified over a specific period (typically the last 90 days). High churn indicates that the requirements for this area are volatile, the original design didn't account for current needs, or the code is so buggy it requires constant "fixing."
- Code Complexity: This measures how difficult the code is to understand. This can be calculated via Cyclomatic Complexity (the number of execution paths), Cognitive Complexity, or even a simple Line of Code (LoC) count.
When you plot these two metrics on a scatter plot, the files in the upper-right quadrant are your hotspots.
Why Hotspots Predict Bugs
The correlation between hotspots and bugs isn't just theoretical. Research suggests that a small percentage of a codebase (often less than 5%) is responsible for the majority of its bugs. These "bug magnets" are almost always the files that are touched most frequently by the most people.
When a file is complex, it's harder for a developer to hold the entire state in their head. When that same file is changed frequently, the likelihood of a developer making an incorrect assumption about that state increases exponentially. This is why technical debt detection must focus on activity, not just static analysis.
The Churn-Complexity Matrix
The Science Behind Hotspot Analysis
The concept of hotspot analysis is rooted in empirical software engineering research.
Research: Frequently Changed + Complex = Buggy
Studies by researchers like Adam Tornhill (author of Your Code as a Crime Scene) have shown that social metrics—how we work with the code—are often better predictors of bugs than the code itself. A 2011 Microsoft Research study found that "organizational metrics" (number of developers, frequency of changes) were significantly more accurate at predicting defects than traditional code metrics like Halstead complexity.
Commit Frequency as a Signal
Every commit is a data point. If a file is changed in 50 different Pull Requests over three months, it signals one of three things:
- It is a "God Object" that knows too much and is involved in too many features.
- The abstractions are "leaky," requiring changes to this file whenever an external dependency changes.
- The file is a "bug farm" where every fix introduces a new regression.
Complexity Metrics That Matter
While line counts are a crude proxy, modern intelligence platforms like repowise use Abstract Syntax Tree (AST) parsing to calculate more nuanced metrics. For example, a 500-line configuration file is significantly less risky than a 200-line nested if-else block in a C++ engine. By parsing the AST, we can look at:
- Cyclomatic Complexity: The number of linearly independent paths through a program's source code.
- Deeply Nested Logic: A high-weight signal for cognitive load.
- Fan-in/Fan-out: How many other modules depend on this file, and how many does it depend on?
How repowise Calculates Hotspots
repowise is an open-source, self-hostable codebase intelligence platform designed to make this data accessible to every developer. It doesn't just look at the code; it mines the entire history of your repository.
Data Sources: Git Log + AST
Repowise combines two distinct streams of data:
- Git Intelligence: It walks the git commit history, tracking how many times each file was modified, who touched it (for bus factor analysis), and which files tend to change together (co-change patterns).
- Structural Intelligence: It uses tree-sitter to parse languages like Python, TypeScript, Go, and Rust, building a complete map of symbols and their complexities.
The Scoring Algorithm
The internal scoring engine in repowise weights these factors to produce a normalized "Risk Score." It accounts for:
- Recency Bias: A change made yesterday is more relevant than a change made two years ago.
- Complexity Density: Complexity relative to the size of the file.
- Ownership Fragmentation: If 10 different people have touched a complex file recently, the risk score increases because no single person has a complete mental model of the file.
Co-Change Partners
One of the most powerful features of the repowise engine is detecting "Temporal Coupling." If AuthService.ts and PaymentGateway.ts are changed together in 80% of commits, they are logically coupled even if they don't have a direct code dependency. Repowise flags these as high-risk pairs; a change to one without the other is a likely source of bugs. You can see this in action on the ownership map for Starlette.
90-Day Rolling Window
By default, repowise focuses on a 90-day rolling window. This ensures that the git hotspot analysis remains actionable. You don't want to be alerted about a complex file that was refactored and stabilized two years ago; you want to know about the complexity that is currently slowing your team down.
Repowise Hotspot Report Output
Running Hotspot Analysis
Getting started with repowise is straightforward. Because it's self-hostable (AGPL-3.0), your code never has to leave your infrastructure.
repowise init and the Hotspot Report
To analyze a repository, you simply run:
# Install the CLI
npm install -g @repowise/cli
# Initialize repowise in your repo
repowise init
# Generate the intelligence report
repowise analyze --hotspots
This command triggers the AST parsers and git miners. The resulting report provides a ranked list of the most volatile files in your project. For a live look at what this output looks like, check out the hotspot analysis demo.
Interpreting the Results
When you look at your hotspot report, don't just look at the top item. Look for patterns:
- Are the hotspots concentrated in one module? This suggests an architectural boundary issue.
- Is the churn caused by one specific developer? This might be a training or ownership opportunity.
- Are "Utility" files appearing as hotspots? This is a red flag that your utilities are doing too much and should be broken down.
Using get_risk() for Individual Files
Repowise also provides a suite of 8 MCP tools for AI agents. If you are using Claude Code, Cursor, or Cline, you can use the get_risk() tool to query the risk profile of a specific file during development.
// Example MCP Tool Call
{
"name": "get_risk",
"arguments": {
"path": "src/services/auth_provider.py"
}
}
The tool returns the hotspot score, recent churn, and a list of "co-change partners" that you should probably check before submitting your PR.
What to Do With Your Hotspot List
Identifying the hotspots is only half the battle. The goal is to reduce the risk.
1. Prioritize Refactoring
If a file has a high risk score, it is the best candidate for your next "refactor sprint." Reducing the complexity of a high-churn file has a much higher ROI than refactoring a stable, complex file.
2. Add Test Coverage
If you can't refactor a hotspot immediately, surround it with tests. Hotspots should ideally have 90%+ branch coverage because they are the most likely places for regressions to occur.
3. Split Large Files
Often, a file is a hotspot simply because it's too big. Splitting a "God Object" into smaller, single-responsibility modules won't necessarily reduce the total complexity of the system, but it will reduce the churn per file and make the code easier to reason about.
4. Assign Clear Ownership
Risk increases when many people touch the same complex code. Assigning a "Primary Maintainer" to a hotspot ensures that at least one person has the full context required to review changes safely. You can use the ownership map to see who currently "owns" different parts of your codebase based on git history.
MCP get_risk() Visualization
Hotspot Analysis in Practice: FastAPI Example
Let's look at a real-world example. If we run repowise on the FastAPI repository, we can see how the intelligence engine maps out the project. FastAPI is extremely well-maintained, but like any large project, it has areas of high activity.
By exploring the auto-generated docs for FastAPI, we can see how repowise identifies core routing logic as an area of high importance. Using the FastAPI dependency graph demo, we can then see how these hotspots are connected to the rest of the system. If a hotspot in the core dependency injection system changes, the "blast radius" is significant because so many other modules depend on it.
Integrating Hotspots Into Your Workflow
Intelligence is only useful if it's surfaced at the right time.
PR Review: Check If You're Touching a Hotspot
Integrate repowise into your CI pipeline. When a PR is opened, the repowise bot can comment:
"⚠️ You are modifying
auth_logic.py, which is a top-5 hotspot in this repo. Please ensure you have updated the integration tests."
Sprint Planning: Factor In Risk
During planning, use the hotspot report to justify technical debt work. Instead of saying "we need to clean up the code," you can say "this specific module has a risk score of 0.92 and has been involved in 4 of our last 5 production bugs." This data-driven approach is much more effective at getting buy-in from product managers.
To learn more about how the underlying engine works, check out our architecture page.
Key Takeaways
- Code Hotspots are the intersection of high churn and high complexity.
- Git Hotspot Analysis uses historical data to predict where future bugs will occur.
- Repowise automates this by mining git logs and parsing ASTs to provide actionable risk scores.
- Focus your refactoring on areas where the risk score is highest for the best ROI on your time.
- Leverage MCP tools like
get_risk()to bring this intelligence directly into your AI-assisted coding workflow.
Technical debt isn't just about "bad code"—it's about the risk that code poses to your velocity and stability. By identifying your hotspots, you can stop guessing and start engineering with confidence.
FAQ
Q: Is a high churn file always a problem? A: No. During the early stages of a feature, churn is expected. Hotspot analysis is most effective on mature codebases where churn indicates instability rather than rapid feature growth.
Q: Which complexity metric is best? A: Repowise uses a weighted combination. While Cyclomatic Complexity is classic, we find that "Cognitive Complexity" (how hard it is for a human to read) is often a better predictor of developer error.
Q: Can I run this on my local machine? A: Yes. Repowise is designed to be self-hosted. You can run it as a CLI tool or deploy it as a full platform within your VPC. See our GitHub for setup instructions.


