Audit a Flask monolith before acquisition with graph, history, wiki, ADRs

repowise team··12 min read
pre-acquisition code audit

The first thing we look for in a pre-acquisition code audit is not “is the repo clean?” It is “what will surprise the buyer after week two?” In a Flask monolith, that answer usually hides in coupling, churn, stale docs, and decisions that survived long after the code changed.

A repo-wide read-through is the wrong first move because it rewards linearity in a system that is not linear. Flask applications accrete routes, blueprints, utility modules, and background jobs in ways that make top-down reading feel productive while missing the real cost centers: a tiny helper with half the app hanging off it, a stable-looking package that churns every week, or a wiki page everyone trusts that no longer matches the code. For a pre-acquisition code audit, you want evidence about blast radius, ownership seams, and decision drift before you ask a human to explain the architecture.

A code search tells you where strings live; an acquisition audit needs blast radius, ownership seams, and stale assumptions

Code search answers “where is this symbol referenced?” That is useful, but it is not diligence. A buyer, a rewrite team, or a lead deciding whether to inherit a service needs four different questions answered at once: what depends on what, what actually changes, what documentation is stale, and which decisions still hold.

That is why a pre-acquisition code audit should be built from four layers instead of one search box:

graph intelligence

FOUR LAYERS OF ACQUISITION SIGNALFOUR LAYERS OF ACQUISITION SIGNAL

LayerQuestion answeredSignal to look forAcquisition implication
Dependency graphWhat depends on what?Entry points, callers/callees, SCCs, high betweennessSmall files can still be rewrite blockers
Git historyWhat actually changes?Hotspots, ownership %, co-change pairs, bus factor“Stable” code may be operationally expensive
Wiki freshnessCan we trust the docs?Freshness score, confidence, stale pagesDocs can confirm or mislead; staleness is a risk signal
ADRsWhy is it built this way?Decision records linked to graph nodesIntent outranks prose, but still needs verification

That table is the whole thesis. A pre-acquisition code audit is a search for hidden maintenance cost, not a ceremonial inventory of files.

Flask makes this especially necessary because the architecture often looks friendly from the outside. Blueprints break the app into logical pieces, but they do not tell you which pieces are actually coupled, which ones are on the critical path, or which utility module is quietly shared by half the request stack. security review on a Python monolith is the kind of post that catches the obvious problems; a diligence pass has to catch the expensive ones.

Use the dependency graph to map the Flask monolith the way an acquirer would: entry points, communities, and hidden blast radius

The graph layer starts with a tree-sitter file and symbol graph, then adds call resolution, communities, and execution flow. In practice, that means you stop reading the repo as folders and start reading it as neighborhoods.

For a Flask monolith, the first pass is simple:

  • find entry points: app factory, blueprint registration, CLI commands, background job entry points
  • trace callers and callees out from those points
  • look for strongly connected components, because cycles are where rewrites go to die
  • rank modules by PageRank or betweenness to find glue code that sits between otherwise separate areas

A tiny module can still own a lot of downstream surface area. This is the kind of thing top-down reading misses.

# app/utils/url_state.py
from flask import request

def current_tenant_id() -> str | None:
    return request.headers.get("X-Tenant-ID")

def tenant_filter(query):
    tenant_id = current_tenant_id()
    if tenant_id:
        return query.filter_by(tenant_id=tenant_id)
    return query

On disk, that looks like a harmless utility. In a graph, it may sit on the path from almost every route to almost every query builder. If tenant_filter() is imported by billing, search, admin, and exports, the blast radius is not in the file size. It is in the fact that changing it changes request semantics across the app.

That is the part a pre-acquisition code audit needs to surface: a small helper with high downstream blast radius is more expensive than a large, isolated module.

SMALL FILE, LARGE BLAST RADIUSSMALL FILE, LARGE BLAST RADIUS

What surprised us when we started doing this kind of audit is how often the “important” file was not the one people named in meetings. The file with the loudest reputation was frequently a leaf. The file with the quietest name was frequently the choke point. That is why graph metrics matter more than folklore.

If you want a practical way to think about it, a high-betweenness module is the place where a rewrite team will accidentally recreate the old system’s coupling unless they notice it early. An SCC is not just a graph fact; it is a warning that the code already behaves like a subsystem, whether the repo admits it or not.

Read git history for churn, not folklore: the hot paths are usually not the files people name in meetings

History changes the question from “what is important?” to “what keeps changing, and why?” That is a much better question for a pre-acquisition code audit because maintenance cost tends to hide in repeated edits, not in dramatic architecture diagrams.

The useful signals are:

  • hotspot scoring as churn × complexity
  • ownership percentage, to see whether one person or one team carries the module
  • co-change pairs, to show which files move together
  • bus factor, to identify fragile knowledge concentration
  • significant commit messages, to separate product churn from structural churn

The default history window should be tunable, not treated like doctrine. Sometimes 500 commits is enough to expose the patterns. Sometimes you need less, sometimes more, depending on how active the repo is and how far back the acquisition question reaches. The point is not the number. The point is that the window should match the diligence question.

git hotspots and ownership

Here is the sort of history pattern that changes a buyer’s view quickly: a payments.py module with only a modest file count, but repeated edits every week because it co-changes with auth and request handling. In code review, it looks contained. In history, it is where the app’s policy surface keeps getting renegotiated.

PathCode impressionHistory signalAcquisition implication
app/payments.pySmall, readable, “done”High churn, repeated co-change with auth.py and request.pyHidden policy coupling, likely rewrite risk
app/admin/views.pyLarge, messy, obviousModerate churn, concentrated ownershipPainful but at least legible
app/utils/cache.pyTiny helperLow file count, high hotspot scoreShared failure point, test before trusting

The co-change pattern matters more than the anecdote. If the same files keep moving together, the repo is telling you that the architecture is not where the folder structure says it is. It is somewhere else, usually in a utility module, a decorator, or a request wrapper.

This is also where a pre-acquisition code audit stops being a static read and becomes a time-series read. A codebase that “looks stable” can still be expensive if the same paths are absorbing repeated business-rule changes. That is not technical debt in the abstract. It is future headcount.

When wiki freshness and ADRs disagree, trust the decision record first and use the wiki as a staleness signal

Documentation is where diligence often gets lazy. People ask, “Is there a wiki?” when they should ask, “Can I trust it?” Freshness scoring exists for exactly that reason. A wiki page that is stale is not just incomplete; it is evidence that the team’s mental model may have drifted away from the code.

The hierarchy I use is simple:

  1. ADRs first, because they capture intent
  2. code second, because it is the current executable truth
  3. wiki third, because it is useful only when freshness and confidence say so

wiki freshness scoring

If an ADR says “we moved auth decisions into the request layer to support multi-tenant isolation,” but the wiki still describes a service-level check, the ADR outranks the prose. If the code now bypasses both, then the real answer is “the decision was made, then partially undone, then forgotten.” That is the kind of drift a pre-acquisition code audit should surface before anyone signs.

A reasonable conflict rule is:

  • if ADR and code agree, trust both
  • if ADR and code disagree, trust code for current behavior and ADR for intent
  • if wiki disagrees with either, treat the wiki as a staleness signal, not a source of truth
  • if the ADR is stale too, verify against git history before you decide the decision still holds

ADRs are not magic. A stale ADR can be worse than no ADR because it gives the illusion of intentionality. But when ADRs are linked to graph nodes, they become useful in a very specific way: they let you ask whether the current coupling matches the original design decision, or whether the code drifted while the document sat still.

ADR, WIKI, CODE DISAGREEADR, WIKI, CODE DISAGREE

A worked acquisition audit on a Flask service: what we would sample, what we would ignore, and what would trigger a rewrite recommendation

Here is the kind of sampling pass I would do on a Flask monolith before acquisition or rewrite. Not every module needs equal attention. You want a few strategically chosen samples that tell you whether the repo is fundamentally understandable or fundamentally sticky.

ModuleGraph positionHistory signalWiki / ADR signalWhat I would askDecision hint
app/routes/billing.pyEntry-point adjacent, many callersHigh churn, repeated co-change with authWiki fresh, ADR exists“Why does billing own auth checks?”Sample more
app/utils/request.pyHigh betweenness, low surface areaModerate churn, many ownersWiki stale, ADR stale“What changed in request semantics?”Rewrite candidate if drift is real
app/models/invoice.pyInside one community, low betweennessLow churn, concentrated ownershipDocs mostly aligned“Is this a stable domain boundary?”Safe to inherit
app/admin/views.pyCommunity boundary, multiple dependentsHigh churn, bus factor lowWiki outdated“Who can safely modify this?”Sample more
app/legacy/flags.pySCC-adjacent, weird dependenciesHotspot, old co-change pairsNo ADR, stale wiki“Is this still active policy?”Rewrite candidate

The sample questions matter because each layer answers a different part of the acquisition problem:

  • graph: where does this module sit, and how much does it touch?
  • history: how often does it move, and with whom?
  • wiki: can we trust the current explanation?
  • ADRs: why was it built this way in the first place?

A pre-acquisition code audit should end with one of four outputs:

  • safe to acquire: graph is clean enough, churn is localized, docs are fresh, ADRs match code
  • safe to inherit: there is complexity, but it is legible and ownership is not trapped
  • rewrite candidate: high blast radius plus high churn plus stale or missing decisions
  • needs deeper sampling: the signals conflict, so the sample set is too small to sign off

The common failure mode is to stop after the first clean-looking module. Don’t. One stable package does not redeem a monolith if the request layer, auth layer, and “small” utilities are all coupled in ways the folder tree hides.

We initially got this wrong by over-weighting documentation quality. A repo can have polished docs and still be a terrible acquisition if the history says the same files absorb every policy change. Fresh docs are nice. Fresh docs that line up with low churn and sane graph structure are what matter.

What we would ask before green-lighting the deal: the minimum evidence set for acquisition, inheritance, or rewrite

If I were in the room making a call, I would want a minimum evidence set across all four layers before I green-light a pre-acquisition code audit result.

I would ask:

  • Which modules sit on the highest betweenness paths?
  • Which modules have the highest churn × complexity hotspots?
  • Which files have concentrated ownership, and where is the bus factor low?
  • Which wiki pages are stale, and what changed since they were written?
  • Which ADRs still match the code, and which ones are only historical artifacts?
  • Which modules keep co-changing with auth, request handling, or billing?
  • Which “small” files have the largest downstream blast radius?

our pallets/flask token-efficiency benchmark

The point is not to collect every possible signal. The point is to collect enough evidence to stop guessing. Once the graph, history, wiki, and ADRs agree well enough, you can decide. Once they disagree sharply, you should escalate, sample more, or walk away from a rewrite claim that sounds cleaner than the repo actually is.

If you want a concrete rule of thumb, I use this:

  • acquire when the repo is messy but explainable
  • inherit when the coupling is real but bounded
  • rewrite when the hot paths, ownership, and decision drift all point to the same brittle center
  • stop sampling and escalate when the evidence is contradictory in more than one layer

That is the useful shape of a pre-acquisition code audit. Not a spreadsheet of warnings, not a security checklist, and definitely not a top-down read-through of a Flask monolith pretending to be a package.

FAQ

How do you do a pre-acquisition code audit on a Flask monolith?

Start with the dependency graph to find entry points, SCCs, and high-betweenness modules. Then read git history for hotspots and co-change, check wiki freshness for trust, and verify ADRs against current code. The goal is to estimate hidden maintenance cost, not to read every file.

What should I look for in a codebase before acquisition or rewrite?

Look for blast radius, churn, ownership concentration, stale documentation, and decisions that no longer match the code. A small Flask utility with many dependents is often more important than a large feature module with low coupling.

Can git history reveal hidden maintenance cost during technical due diligence?

Yes. Hotspot scoring, ownership percentage, co-change pairs, and bus factor are often better predictors of future pain than file size or folder names. A module that keeps changing with auth or request handling is usually carrying more policy than it admits.

How do wiki freshness scores and ADRs help in a code audit?

Freshness scores tell you whether the wiki is trustworthy enough to cite. ADRs tell you why the system was designed a certain way, and they should outrank wiki prose when there is conflict. If both are stale, verify against code and git history before drawing a conclusion.

Try repowise on your repo

One command indexes your codebase.