Refactoring Priorities: Picking What to Fix First

repowise team·May 20, 2026·13 min read

refactoring prioritieswhat to refactor firstimpact per effortrefactoring decision frameworktechnical debt prioritization

Refactoring Priorities: Picking What to Fix First

Most teams do not have a refactoring problem. They have a refactoring priorities problem. There is always more code that could be cleaned up than time to clean it. The job is to choose work that changes delivery speed, defect rate, and on-call pain, not work that only looks bad in a diff. Martin Fowler describes refactoring as small, behavior-preserving steps on code that is likely to change again. That is the right place to start. The missing piece is deciding what to refactor first. (martinfowler.com)

Why most refactor plans go stale

Refactor plans go stale for one simple reason: they are written as a wish list, not as a decision system. A cleanup backlog that says “simplify auth,” “remove duplication,” and “improve naming” does not tell you what to touch this week. It also does not tell you what to ignore. Martin Fowler’s basic rule still holds: refactor code that will change again. Code that is stable, isolated, and rarely touched is often not worth the cost. (martinfowler.com)

The second failure mode is that teams optimize for code aesthetics instead of business impact. A tidy file that nobody edits is a poor use of engineering time. A messy file on a hot path, with many dependents and weekly changes, is a better target. That distinction is the core of technical debt prioritization. CodeScene’s hotspot guidance uses the same logic: focus on areas where change frequency and weak code health meet. (codescene.com)

The third failure mode is treating refactoring as a separate project. That usually fails. Refactoring pays back when it is attached to active work: a bug fix, a feature, an incident, or a dependency upgrade. If you want a practical system, build one around impact per effort, not around “clean as much as possible.”

The impact-per-effort framing

The fastest way to improve refactoring priorities is to score each candidate on two axes:

Impact: how much future pain this change removes.
Effort: how long the change will take and how risky it is.

The best target is usually not the worst-looking code. It is the code with the highest impact per effort. That ratio is why a medium-sized cleanup in a busy subsystem often beats a giant rewrite in a quiet one. CodeScene’s refactoring-targets material describes the same idea: focus on the overlap between low code health and high change frequency. (codescene.com)

Estimating impact

Impact is a mix of visible and hidden costs.

Score higher when the code:

Changes often.
Sits on a hot path.
Has many dependents.
Causes repeated bugs or regressions.
Blocks tests, reviews, or safe rollout.
Appears in incident notes or support tickets.

Git history helps here. A file that changes often and pulls in other files is usually a better target than a large but static file. That is the idea behind hotspot analysis and co-change tracking. It is also why ownership maps matter: a fragile area that only one person understands carries more risk than a larger area shared by several engineers. (codescene.com)

A simple impact rubric:

Signal	Low	Medium	High
Change frequency	Rare	Monthly	Weekly or more
Downstream blast radius	Few dependents	Some dependents	Many dependents
Bug history	None	Occasional	Repeated
Delivery drag	Mild	Noticeable	Constant
Team knowledge	Broad	Partial	Single-owner

If a candidate scores high in three or more rows, it deserves attention.

Estimating effort

Effort is not just lines of code. A small change in a brittle area can cost more than a larger change in a clean one.

Estimate effort by looking at:

Number of files touched.
Test coverage around the area.
Depth of dependency chain.
Number of public APIs changed.
Migration work for callers.
Review complexity.
Need for phased rollout.

Martin Fowler’s refactoring guidance emphasizes small steps because each step should preserve behavior. That is not only a safety rule. It also controls cost. The more you can split a change into safe increments, the lower the effort score. (martinfowler.com)

A good heuristic:

Low effort: one module, strong tests, no external API changes.
Medium effort: several modules, some tests, caller updates.
High effort: public interface changes, migration scripts, cross-team coordination.

A scoring rubric you can run by hand

You do not need a tool to start. A whiteboard and a spreadsheet are enough.

Use a 1–5 scale for each dimension.

Step 1: score impact

Give each target a score from 1 to 5 for:

Change frequency
Bug count
Dependency fan-out
Incident involvement
Team friction

Add the scores. Maximum is 25.

Step 2: score effort

Give each target a score from 1 to 5 for:

Files touched
API surface
Test gaps
Migration work
Coordination overhead

Add the scores. Maximum is 25.

Step 3: compute priority

Use this formula:

priority = impact / effort

Higher is better.

If you want a tie-breaker, multiply impact by 2 and subtract effort:

priority = (impact × 2) - effort

That version is easier to do by hand during a planning meeting.

Step 4: filter out bad candidates

Some refactors should be postponed even if the score looks good:

Code that will be deleted soon.
Code with no planned changes.
Code blocked by a wider architecture change.
Cleanup that only improves style.
Changes that would hide a bigger product or domain issue.

This is where a refactoring decision framework helps. It keeps you from wasting cycles on code that feels messy but does not move the system forward.

Step 5: pick a time box

Do not pick “the refactor.” Pick a slice of it.

Examples:

“Extract payments validation from checkout.”
“Split one 1,500-line module into three packages.”
“Remove direct DB access from the API layer.”
“Break the shared util into domain-specific helpers.”

Smaller targets create a shorter feedback loop and a lower chance of half-finished work.

What a practical refactoring decision framework looks like

A good framework answers five questions:

Will this code change again soon?
How much pain does it cause today?
How many systems depend on it?
Can I reduce risk with tests or seams first?
Can I finish this in a short, reviewable slice?

If the answer to 1 is no, pause. If the answer to 2, 3, or 4 is yes, the target moves up the list. If the answer to 5 is no, split the work.

Here is the version I use in practice:

Question	Why it matters	What to look for
Will it change again?	Refactoring pays back on future edits	Active feature area, frequent bug fixes
Does it create repeated pain?	Pain is a signal, not a feeling	Slow reviews, regressions, fragile tests
What is the blast radius?	High fan-out means high leverage	Many dependents, shared utilities
Can I make it safe?	Safety reduces effort	Good tests, feature flags, seam extraction
Can I finish quickly?	Time-boxing keeps the work real	Clear boundaries, one-owner scope

This is where repowise becomes useful. Its hotspot analysis demo shows how change frequency and code health combine into a concrete target list, and its architecture page explains how those signals are assembled into a repo-wide view. If you want to see that data shape on a real project, the FastAPI dependency graph demo makes the hidden coupling obvious. Those views are built for this decision problem. (codescene.com)

Doing it automatically

Manual scoring works for a few dozen targets. It breaks down when the repo grows and the refactor list keeps moving.

Automation helps in three places:

Discovery: find files and modules that deserve review.
Ranking: sort candidates by expected impact.
Explanation: show why a target is hot, risky, or hard to change.

That is where code intelligence tools earn their keep. The Model Context Protocol is now an open standard for connecting AI systems to data sources and tools, with an official spec and SDKs published by the MCP project. OpenAI also documents MCP support for its developer tools and remote servers. (anthropic.com)

For refactoring work, the point is not “AI writes the fix.” The point is that an agent can ask for the right context before it guesses. A repo with file-level docs, dependency paths, ownership, churn, and hotspot data gives you a better shortlist than grep ever will.

repowise refactoring-targets view

Repowise combines four signals that matter for refactoring priorities:

Auto-generated docs for modules and symbols.
Git intelligence for churn, ownership, and co-change patterns.
Dependency graphs for fan-out and hidden coupling.
Code health biomarkers for risk and maintainability.

That combination is useful because it cuts through the usual argument. You do not need to debate whether a package is “messy.” You can inspect why it is expensive to change. See the auto-generated docs for FastAPI for what that context looks like, then compare it with the ownership map for Starlette to see how single-owner areas stand out in git history. (codescene.com)

A practical workflow looks like this:

Open the hotspot view.
Sort by risk or code health.
Check dependents and co-change partners.
Confirm the module still changes often.
Pick the smallest slice that removes the most friction.

If you want the tooling path, try repowise on your own repo or start with pip install repowise && repowise init. The MCP server is configured automatically, so an agent can query the repo context without extra wiring. (platform.openai.com)

What good first-quarter targets look like

The best first-quarter targets are boring in the right way. They are not heroic rewrites. They are the pieces that keep getting in the way.

Good candidates usually have these traits:

They are active.
They are small enough to finish.
They sit on a path that many changes touch.
They cause repeated review comments or bugs.
They have a clear seam for extraction.

Strong examples

A shared utility file that became a junk drawer
- High fan-out.
- Lots of unrelated helpers.
- Changes every week.
A package that handles too many concerns
- Parsing, validation, persistence, and I/O in one place.
- Hard to test in isolation.
- Every feature request lands there.
A critical path with repeated regressions
- Checkout, auth, billing, sync, or scheduling logic.
- The code may not be large, but it is expensive to touch.
A module with hidden ownership
- Only one engineer knows it.
- Every change needs tribal knowledge.
- Bus factor is low.
An interface with obvious duplication across callers
- Same validation repeated in multiple services.
- Easy to extract, easy to test, easy to review.

Weak examples

A static admin script nobody runs.
Dead code scheduled for deletion.
A pretty file with no future work.
A rewrite requested because “it feels old.”
A large subsystem that would need months of coordination before the first merged change.

A useful filter is this: if the code disappeared tomorrow, would anyone feel pain in the next 30 days? If not, it is probably not the first thing to fix.

When to refactor vs rewrite

This is one of the most common false choices in engineering planning. A rewrite feels decisive. A refactor feels slow. The right choice depends on changeability, risk, and how much of the existing system still works.

Refactor when:

The system mostly works.
You can make safe incremental changes.
The pain is local.
The code will keep changing.
You can verify behavior with tests.

Rewrite when:

The architecture is wrong for the job.
The existing design blocks basic requirements.
The domain has changed so much that adaptation is costlier than replacement.
You can run both systems during migration.
You have time, staffing, and rollback paths.

Martin Fowler’s framing is useful here: refactoring is for improving the design of existing code through small, behavior-preserving steps. That means a rewrite is a different bet. It replaces the system, while refactoring improves it piece by piece. (martinfowler.com)

A simple decision rule:

Condition	Refactor	Rewrite
Existing behavior mostly correct	Yes	Maybe
Need gradual delivery	Yes	No
Tests available	Yes	Helpful but not required
Architecture fundamentally wrong	Sometimes	Often
Team can support dual run	Sometimes	Often required

If you are unsure, start with a refactor spike. A small extraction or boundary cleanup will tell you whether the system can be shaped in place.

How to build a refactoring backlog that stays useful

A backlog only works if it stays current.

Use this operating model:

Review it with real work. Add candidates during bugs, incidents, and feature work.
Re-score monthly. Change frequency and ownership move quickly.
Delete stale items. If the code is no longer active, remove it.
Attach targets to owners. Someone should be able to explain the next step.
Track outcomes. Measure fewer bugs, faster reviews, shorter lead time, or lower incident count.

The most valuable backlog item is often the one that keeps appearing in unrelated work. That repetition is the signal.

FAQ

What is refactoring prioritization?

It is the process of choosing which code changes should be cleaned up first. Good refactoring priorities focus on impact per effort, not just code ugliness.

What should I refactor first?

Start with active code that changes often, causes repeated pain, and has a wide blast radius. If you can fix it in a small, safe slice, it moves to the top of the list.

How do I measure impact per effort?

Score impact using change frequency, bug history, dependency fan-out, and delivery drag. Score effort using files touched, test coverage, API changes, and migration cost. Divide impact by effort to rank candidates.

Is technical debt prioritization the same as refactoring priorities?

Close, but not identical. Technical debt prioritization covers many kinds of debt, including missing tests, poor architecture, and operational risk. Refactoring priorities is the code-shape part of that larger problem.

When should I use a tool instead of manual scoring?

Use a tool when the repo is large, ownership is unclear, or you need evidence from history and dependency graphs. Manual scoring is fine for a small list. Automation helps when you need a repeatable refactoring decision framework across many services.

Can I use this with AI coding tools?

Yes, if the tools have the right context. MCP is an open standard for connecting AI systems to data sources and tools, and OpenAI documents support for remote MCP servers in its developer tooling. That means an agent can ask for repo context before proposing a change. (anthropic.com)

What does “done” look like for a refactor?

Done means the code is easier to change, tests still pass, the behavior is stable, and the next edit will be cheaper than before. If the code only looks cleaner but still hurts to touch, the refactor was cosmetic.

Where does repowise fit in this workflow?

Repowise turns repo context into ranked targets. It exposes docs, history, ownership, dependency paths, dead code, and code health through its own MCP server, so an agent can inspect the codebase before making a refactoring call. If you want a concrete starting point, use the live examples or the architecture page to see the data model behind the output. (platform.openai.com)

One last rule for refactoring priorities

If a change will affect a lot of future work, is easy to verify, and has a narrow scope, do it first. If it is large, risky, and far from active work, leave it alone until the signal gets stronger. That rule keeps refactoring tied to value instead of taste.

Refactoring Priorities: Picking What to Fix First

Refactoring Priorities: Picking What to Fix First

Why most refactor plans go stale

The impact-per-effort framing

Estimating impact

Estimating effort

A scoring rubric you can run by hand

Step 1: score impact

Step 2: score effort

Step 3: compute priority

Step 4: filter out bad candidates

Step 5: pick a time box

What a practical refactoring decision framework looks like

Doing it automatically

repowise refactoring-targets view

What good first-quarter targets look like

Strong examples

Weak examples

When to refactor vs rewrite

Refactor when:

Rewrite when:

How to build a refactoring backlog that stays useful

FAQ

What is refactoring prioritization?

What should I refactor first?

How do I measure impact per effort?

Is technical debt prioritization the same as refactoring priorities?

When should I use a tool instead of manual scoring?

Can I use this with AI coding tools?

What does “done” look like for a refactor?

Where does repowise fit in this workflow?

One last rule for refactoring priorities

Spotting Declining Code Health Trends Before They Bite

Hidden Coupling: Finding the Files That Always Change Together

Nested Complexity vs Cyclomatic Complexity (and Why It Matters)

Try repowise on your repo