Building a Dependency Graph for Any Codebase (Python, TS, Go, Rust, Java)

repowise team··10 min read
code dependency graphdependency graph toolvisualize code dependenciesdependency graph generatorimport analysis

Every software engineer has experienced the "wall of code" phenomenon. You join a new project, clone the repository, and find yourself staring at 100,000 lines of code spread across hundreds of files. You make a "simple" change to a utility function in a Python module, only to watch a seemingly unrelated service in a different directory break during integration tests. This is the fundamental challenge of modern software development: we don't just write code; we manage complex, hidden webs of interconnectedness.

To navigate this complexity, a code dependency graph is not a luxury—it is a map. Without one, you are orienteering without a compass. A robust dependency graph tool transforms a flat directory of files into a structured, navigable topology, allowing you to visualize code dependencies and understand the ripple effects of every commit.

At repowise, we built our platform on the belief that codebase intelligence starts with understanding these relationships. By parsing imports across 10+ languages and applying graph theory algorithms like PageRank and community detection, we turn raw source code into actionable architectural insights.

Why Dependency Graphs Matter

As codebases grow, the cognitive load required to maintain a mental model of the system scales non-linearly. A dependency graph provides a mathematical and visual representation of how your components interact, which is critical for several engineering workflows.

Understanding Code Structure at Scale

Most developers understand the "local" context—the file they are currently editing and its immediate imports. However, understanding the "global" context—how that file fits into the broader system architecture—is much harder. A dependency graph generator allows you to zoom out, seeing the high-level boundaries between modules and identifying where those boundaries have been breached.

Finding Critical Files (PageRank)

Not all files are created equal. Some are leaf nodes with no dependents, while others are "load-bearing" modules that the entire system relies upon. By applying the PageRank algorithm to a code dependency graph, we can identify these critical nodes. If a file has a high PageRank, it means many other important files depend on it. These are your "high-stakes" files; a bug here is a systemic failure, and a refactor here requires extreme caution.

Dependency Graph Topology & PageRankDependency Graph Topology & PageRank

Detecting Circular Dependencies

Circular dependencies (where File A imports File B, which eventually imports File A) are a form of architectural debt. They make code harder to test, prevent proper modularization, and in some languages, cause runtime initialization errors. A dependency graph makes these cycles immediately visible, allowing teams to break the loop and move toward a more maintainable, acyclic structure.

How Dependency Graphs Work

Building a reliable dependency graph is more complex than a simple grep for "import" statements. It requires a deep understanding of the language's semantics and module resolution logic.

Parsing Imports Across Languages

The first step is extraction. A sophisticated dependency graph tool must parse the source code into an Abstract Syntax Tree (AST). By analyzing the AST, the tool can distinguish between a real import, a commented-out line, and a string that happens to look like a file path.

For example, in a TypeScript codebase, the parser must handle:

  • Static imports: import { func } from './module'
  • Dynamic imports: import('./module').then(...)
  • Re-exports: export * from './module'
  • Type-only imports: import type { User } from './types'

Building the Directed Graph

Once the imports are extracted, they are modeled as a Directed Graph (G = V, E).

  • Nodes (V): Represent files or modules.
  • Edges (E): Represent the relationship "A imports B". The edge is directed because the dependency has a specific direction.

Node = File/Module, Edge = Import

In repowise, we treat the file as the primary unit of the graph, but we also support "module-level" aggregation. This allows you to see dependencies between directories (e.g., does the frontend directory depend on the backend directory?), which is often more useful for high-level architecture reviews than a granular file-level view. You can see this in action by exploring the FastAPI dependency graph demo.

Language-Specific Parsing Challenges

Every language has its own philosophy for module resolution. A universal dependency graph generator must account for these nuances to ensure accuracy.

Python: Relative Imports and init.py

Python's import system is notoriously flexible. The presence of __init__.py files defines packages, and relative imports (e.g., from ..utils import helper) require the parser to know the file's location within the package hierarchy. Furthermore, Python allows dynamic imports inside functions, which are often missed by basic static analysis tools but captured by repowise's advanced AST walking.

TypeScript/JavaScript: ESM, CJS, and Path Aliases

The JS ecosystem is a fragmented landscape of CommonJS (require) and ES Modules (import). Beyond that, modern projects often use "Path Aliases" defined in tsconfig.json or webpack.config.js. For example, @/components/Button might map to ./src/shared/components/ui/Button.tsx. A tool that doesn't resolve these aliases will produce a broken graph.

Go: Package-Based Imports

Go organizes code by packages rather than files. All files in a single directory belong to the same package. Imports are done at the package level, meaning if package A imports package B, any file in A can access the exported members of any file in B. This simplifies the graph but requires the tool to map file paths to package names correctly.

Rust: mod.rs and Crate Boundaries

Rust uses a hierarchical module system. A mod.rs file or a mod name; declaration defines the structure. Rust also has strict "crate" boundaries. Understanding the difference between an internal module dependency and an external crate dependency is vital for accurate visualization.

Cross-Language Parsing LogicCross-Language Parsing Logic

What You Can Learn From a Dependency Graph

Once you have generated the graph, you can apply graph theory to extract deep insights about your codebase's health.

PageRank: Identifying the "Core"

As mentioned earlier, PageRank helps you find the most influential files. In a well-architected system, the high-PageRank files should be stable interfaces or utility libraries. If a high-PageRank file is also a "hotspot" (frequently changed and high complexity), you have found a major source of technical risk. You can cross-reference these findings using the hotspot analysis demo.

Community Detection: Natural Module Boundaries

Using algorithms like Louvain modularity, we can identify "communities" within the graph—groups of files that are highly interconnected with each other but sparsely connected to the rest of the codebase. These communities often represent the "true" logical modules of your system. If these communities don't align with your directory structure, it's a sign that your folder organization is misleading.

Cycle Analysis: Circular Dependencies to Break

Cycles are the enemy of modularity. If A -> B -> C -> A, you cannot test A in isolation without also pulling in B and C. Repowise automatically detects these cycles, providing a clear list of edges that need to be removed to return the graph to a Directed Acyclic Graph (DAG) state.

Fan-In/Fan-Out: Interface vs. Implementation

  • Fan-In: How many files import this file? High fan-in suggests a stable, reusable component (an interface).
  • Fan-Out: How many files does this file import? High fan-out suggests a "coordinator" or "orchestrator" (an implementation). High fan-in and high fan-out in the same file often indicates a "God Object" that needs to be decomposed.

Building Dependency Graphs with repowise

Repowise is designed to be the "intelligence layer" for your codebase. It doesn't just show you a pretty picture; it provides the data infrastructure to query your code's structure.

Running the Analysis

Because repowise is self-hostable and open-source (AGPL-3.0), you can run it locally or in your CI/CD pipeline. The analysis engine scans your repository, builds the ASTs, and populates a graph database.

# Example: Running repowise analysis via CLI
repowise analyze ./my-project --output ./report

Visualizing the Graph

The repowise dashboard provides an interactive visualization where you can filter by directory, search for specific files, and highlight dependency paths. This is particularly useful during large-scale refactors where you need to see exactly what will be affected by a change.

Using get_dependency_path()

One of the most powerful features of the repowise MCP server is the get_dependency_path() tool. This allows AI agents (like Claude Code or Cursor) to programmatically find the connection between two parts of your codebase.

If you ask an AI, "How does the Auth service interact with the Database?", the agent uses this tool to find the shortest path in the dependency graph, providing it with the exact context it needs to answer your question accurately.

get_dependency_path() Tool Outputget_dependency_path() Tool Output

Practical Applications

How do you use this in your day-to-day engineering work?

Architecture Reviews

During an architecture review, the dependency graph serves as the "source of truth." Instead of arguing over what the architecture should be, you can look at what it is. It highlights "architectural drift"—where developers have taken shortcuts and bypassed established boundaries.

Refactoring Planning

Before you start refactoring, run a dependency analysis. If you want to move a module, the graph will show you every single file that needs an updated import statement. If you want to delete a "zombie" package, the graph will confirm if there are any lingering unreachable files. To see how this looks in a real project, check out the auto-generated docs for FastAPI.

Understanding Unfamiliar Codebases

For new hires, the dependency graph is the ultimate onboarding tool. By following the edges from the entry point (e.g., main.go or index.ts), they can trace the flow of data and control through the system, significantly reducing the time it takes to make their first meaningful contribution.

Key Takeaways

Building and maintaining a code dependency graph is essential for managing the complexity of modern software. By moving beyond simple text searches and embracing AST-based graph analysis, engineering teams can:

  1. Reduce Risk: Identify load-bearing files using PageRank and protect them with better testing.
  2. Improve Health: Detect and break circular dependencies that stifle modularity.
  3. Enhance AI Workflows: Use tools like the repowise MCP server to give AI agents the structural context they need to be truly effective.
  4. Onboard Faster: Provide a visual map for new developers to navigate the codebase.

Whether you are managing a monolith or a distributed system, understanding your dependencies is the first step toward architectural excellence. To see how repowise can help you map your own codebase, explore our architecture page or try it out on one of our live examples.


FAQ: Code Dependency Graphs

Q: Can I generate a dependency graph for a polyglot repository? A: Yes. Tools like repowise parse multiple languages (Python, TS, Go, Rust, etc.) simultaneously, allowing you to see dependencies even in complex, multi-language environments.

Q: How does this differ from a simple import linter? A: Linters check for specific rule violations (like "don't import X from Y"). A dependency graph provides a holistic view of the entire system, enabling global analysis like PageRank and community detection that linters cannot perform.

Q: Is the graph updated automatically? A: When integrated into your CI/CD or used via the repowise platform, the graph is re-generated on every commit, ensuring your documentation and intelligence stay "fresh."

Try repowise on your repo

One command indexes your codebase.