Refactoring Priorities: Picking What to Fix First
Refactoring Priorities: Picking What to Fix First
Most teams do not have a refactoring problem. They have a refactoring priorities problem. There is always more code that could be cleaned up than time to clean it. The job is to choose work that changes delivery speed, defect rate, and on-call pain, not work that only looks bad in a diff. Martin Fowler describes refactoring as small, behavior-preserving steps on code that is likely to change again. That is the right place to start. The missing piece is deciding what to refactor first. (martinfowler.com)
Why most refactor plans go stale
Refactor plans go stale for one simple reason: they are written as a wish list, not as a decision system. A cleanup backlog that says “simplify auth,” “remove duplication,” and “improve naming” does not tell you what to touch this week. It also does not tell you what to ignore. Martin Fowler’s basic rule still holds: refactor code that will change again. Code that is stable, isolated, and rarely touched is often not worth the cost. (martinfowler.com)
The second failure mode is that teams optimize for code aesthetics instead of business impact. A tidy file that nobody edits is a poor use of engineering time. A messy file on a hot path, with many dependents and weekly changes, is a better target. That distinction is the core of technical debt prioritization. CodeScene’s hotspot guidance uses the same logic: focus on areas where change frequency and weak code health meet. (codescene.com)
The third failure mode is treating refactoring as a separate project. That usually fails. Refactoring pays back when it is attached to active work: a bug fix, a feature, an incident, or a dependency upgrade. If you want a practical system, build one around impact per effort, not around “clean as much as possible.”
The impact-per-effort framing
The fastest way to improve refactoring priorities is to score each candidate on two axes:
- Impact: how much future pain this change removes.
- Effort: how long the change will take and how risky it is.
The best target is usually not the worst-looking code. It is the code with the highest impact per effort. That ratio is why a medium-sized cleanup in a busy subsystem often beats a giant rewrite in a quiet one. CodeScene’s refactoring-targets material describes the same idea: focus on the overlap between low code health and high change frequency. (codescene.com)
Estimating impact
Impact is a mix of visible and hidden costs.
Score higher when the code:
- Changes often.
- Sits on a hot path.
- Has many dependents.
- Causes repeated bugs or regressions.
- Blocks tests, reviews, or safe rollout.
- Appears in incident notes or support tickets.
Git history helps here. A file that changes often and pulls in other files is usually a better target than a large but static file. That is the idea behind hotspot analysis and co-change tracking. It is also why ownership maps matter: a fragile area that only one person understands carries more risk than a larger area shared by several engineers. (codescene.com)
A simple impact rubric:
| Signal | Low | Medium | High |
|---|---|---|---|
| Change frequency | Rare | Monthly | Weekly or more |
| Downstream blast radius | Few dependents | Some dependents | Many dependents |
| Bug history | None | Occasional | Repeated |
| Delivery drag | Mild | Noticeable | Constant |
| Team knowledge | Broad | Partial | Single-owner |
If a candidate scores high in three or more rows, it deserves attention.
Estimating effort
Effort is not just lines of code. A small change in a brittle area can cost more than a larger change in a clean one.
Estimate effort by looking at:
- Number of files touched.
- Test coverage around the area.
- Depth of dependency chain.
- Number of public APIs changed.
- Migration work for callers.
- Review complexity.
- Need for phased rollout.
Martin Fowler’s refactoring guidance emphasizes small steps because each step should preserve behavior. That is not only a safety rule. It also controls cost. The more you can split a change into safe increments, the lower the effort score. (martinfowler.com)
A good heuristic:
- Low effort: one module, strong tests, no external API changes.
- Medium effort: several modules, some tests, caller updates.
- High effort: public interface changes, migration scripts, cross-team coordination.
A scoring rubric you can run by hand
You do not need a tool to start. A whiteboard and a spreadsheet are enough.
Use a 1–5 scale for each dimension.
Step 1: score impact
Give each target a score from 1 to 5 for:
- Change frequency
- Bug count
- Dependency fan-out
- Incident involvement
- Team friction
Add the scores. Maximum is 25.
Step 2: score effort
Give each target a score from 1 to 5 for:
- Files touched
- API surface
- Test gaps
- Migration work
- Coordination overhead
Add the scores. Maximum is 25.
Step 3: compute priority
Use this formula:
priority = impact / effort
Higher is better.
If you want a tie-breaker, multiply impact by 2 and subtract effort:
priority = (impact × 2) - effort
That version is easier to do by hand during a planning meeting.
Step 4: filter out bad candidates
Some refactors should be postponed even if the score looks good:
- Code that will be deleted soon.
- Code with no planned changes.
- Code blocked by a wider architecture change.
- Cleanup that only improves style.
- Changes that would hide a bigger product or domain issue.
This is where a refactoring decision framework helps. It keeps you from wasting cycles on code that feels messy but does not move the system forward.
Step 5: pick a time box
Do not pick “the refactor.” Pick a slice of it.
Examples:
- “Extract payments validation from checkout.”
- “Split one 1,500-line module into three packages.”
- “Remove direct DB access from the API layer.”
- “Break the shared util into domain-specific helpers.”
Smaller targets create a shorter feedback loop and a lower chance of half-finished work.
What a practical refactoring decision framework looks like
A good framework answers five questions:
- Will this code change again soon?
- How much pain does it cause today?
- How many systems depend on it?
- Can I reduce risk with tests or seams first?
- Can I finish this in a short, reviewable slice?
If the answer to 1 is no, pause. If the answer to 2, 3, or 4 is yes, the target moves up the list. If the answer to 5 is no, split the work.
Here is the version I use in practice:
| Question | Why it matters | What to look for |
|---|---|---|
| Will it change again? | Refactoring pays back on future edits | Active feature area, frequent bug fixes |
| Does it create repeated pain? | Pain is a signal, not a feeling | Slow reviews, regressions, fragile tests |
| What is the blast radius? | High fan-out means high leverage | Many dependents, shared utilities |
| Can I make it safe? | Safety reduces effort | Good tests, feature flags, seam extraction |
| Can I finish quickly? | Time-boxing keeps the work real | Clear boundaries, one-owner scope |
This is where repowise becomes useful. Its hotspot analysis demo shows how change frequency and code health combine into a concrete target list, and its architecture page explains how those signals are assembled into a repo-wide view. If you want to see that data shape on a real project, the FastAPI dependency graph demo makes the hidden coupling obvious. Those views are built for this decision problem. (codescene.com)
Doing it automatically
Manual scoring works for a few dozen targets. It breaks down when the repo grows and the refactor list keeps moving.
Automation helps in three places:
- Discovery: find files and modules that deserve review.
- Ranking: sort candidates by expected impact.
- Explanation: show why a target is hot, risky, or hard to change.
That is where code intelligence tools earn their keep. The Model Context Protocol is now an open standard for connecting AI systems to data sources and tools, with an official spec and SDKs published by the MCP project. OpenAI also documents MCP support for its developer tools and remote servers. (anthropic.com)
For refactoring work, the point is not “AI writes the fix.” The point is that an agent can ask for the right context before it guesses. A repo with file-level docs, dependency paths, ownership, churn, and hotspot data gives you a better shortlist than grep ever will.
repowise refactoring-targets view
Repowise combines four signals that matter for refactoring priorities:
- Auto-generated docs for modules and symbols.
- Git intelligence for churn, ownership, and co-change patterns.
- Dependency graphs for fan-out and hidden coupling.
- Code health biomarkers for risk and maintainability.
That combination is useful because it cuts through the usual argument. You do not need to debate whether a package is “messy.” You can inspect why it is expensive to change. See the auto-generated docs for FastAPI for what that context looks like, then compare it with the ownership map for Starlette to see how single-owner areas stand out in git history. (codescene.com)
A practical workflow looks like this:
- Open the hotspot view.
- Sort by risk or code health.
- Check dependents and co-change partners.
- Confirm the module still changes often.
- Pick the smallest slice that removes the most friction.
If you want the tooling path, try repowise on your own repo or start with pip install repowise && repowise init. The MCP server is configured automatically, so an agent can query the repo context without extra wiring. (platform.openai.com)
What good first-quarter targets look like
The best first-quarter targets are boring in the right way. They are not heroic rewrites. They are the pieces that keep getting in the way.
Good candidates usually have these traits:
- They are active.
- They are small enough to finish.
- They sit on a path that many changes touch.
- They cause repeated review comments or bugs.
- They have a clear seam for extraction.
Strong examples
-
A shared utility file that became a junk drawer
- High fan-out.
- Lots of unrelated helpers.
- Changes every week.
-
A package that handles too many concerns
- Parsing, validation, persistence, and I/O in one place.
- Hard to test in isolation.
- Every feature request lands there.
-
A critical path with repeated regressions
- Checkout, auth, billing, sync, or scheduling logic.
- The code may not be large, but it is expensive to touch.
-
A module with hidden ownership
- Only one engineer knows it.
- Every change needs tribal knowledge.
- Bus factor is low.
-
An interface with obvious duplication across callers
- Same validation repeated in multiple services.
- Easy to extract, easy to test, easy to review.
Weak examples
- A static admin script nobody runs.
- Dead code scheduled for deletion.
- A pretty file with no future work.
- A rewrite requested because “it feels old.”
- A large subsystem that would need months of coordination before the first merged change.
A useful filter is this: if the code disappeared tomorrow, would anyone feel pain in the next 30 days? If not, it is probably not the first thing to fix.
When to refactor vs rewrite
This is one of the most common false choices in engineering planning. A rewrite feels decisive. A refactor feels slow. The right choice depends on changeability, risk, and how much of the existing system still works.
Refactor when:
- The system mostly works.
- You can make safe incremental changes.
- The pain is local.
- The code will keep changing.
- You can verify behavior with tests.
Rewrite when:
- The architecture is wrong for the job.
- The existing design blocks basic requirements.
- The domain has changed so much that adaptation is costlier than replacement.
- You can run both systems during migration.
- You have time, staffing, and rollback paths.
Martin Fowler’s framing is useful here: refactoring is for improving the design of existing code through small, behavior-preserving steps. That means a rewrite is a different bet. It replaces the system, while refactoring improves it piece by piece. (martinfowler.com)
A simple decision rule:
| Condition | Refactor | Rewrite |
|---|---|---|
| Existing behavior mostly correct | Yes | Maybe |
| Need gradual delivery | Yes | No |
| Tests available | Yes | Helpful but not required |
| Architecture fundamentally wrong | Sometimes | Often |
| Team can support dual run | Sometimes | Often required |
If you are unsure, start with a refactor spike. A small extraction or boundary cleanup will tell you whether the system can be shaped in place.
How to build a refactoring backlog that stays useful
A backlog only works if it stays current.
Use this operating model:
- Review it with real work. Add candidates during bugs, incidents, and feature work.
- Re-score monthly. Change frequency and ownership move quickly.
- Delete stale items. If the code is no longer active, remove it.
- Attach targets to owners. Someone should be able to explain the next step.
- Track outcomes. Measure fewer bugs, faster reviews, shorter lead time, or lower incident count.
The most valuable backlog item is often the one that keeps appearing in unrelated work. That repetition is the signal.
FAQ
What is refactoring prioritization?
It is the process of choosing which code changes should be cleaned up first. Good refactoring priorities focus on impact per effort, not just code ugliness.
What should I refactor first?
Start with active code that changes often, causes repeated pain, and has a wide blast radius. If you can fix it in a small, safe slice, it moves to the top of the list.
How do I measure impact per effort?
Score impact using change frequency, bug history, dependency fan-out, and delivery drag. Score effort using files touched, test coverage, API changes, and migration cost. Divide impact by effort to rank candidates.
Is technical debt prioritization the same as refactoring priorities?
Close, but not identical. Technical debt prioritization covers many kinds of debt, including missing tests, poor architecture, and operational risk. Refactoring priorities is the code-shape part of that larger problem.
When should I use a tool instead of manual scoring?
Use a tool when the repo is large, ownership is unclear, or you need evidence from history and dependency graphs. Manual scoring is fine for a small list. Automation helps when you need a repeatable refactoring decision framework across many services.
Can I use this with AI coding tools?
Yes, if the tools have the right context. MCP is an open standard for connecting AI systems to data sources and tools, and OpenAI documents support for remote MCP servers in its developer tooling. That means an agent can ask for repo context before proposing a change. (anthropic.com)
What does “done” look like for a refactor?
Done means the code is easier to change, tests still pass, the behavior is stable, and the next edit will be cheaper than before. If the code only looks cleaner but still hurts to touch, the refactor was cosmetic.
Where does repowise fit in this workflow?
Repowise turns repo context into ranked targets. It exposes docs, history, ownership, dependency paths, dead code, and code health through its own MCP server, so an agent can inspect the codebase before making a refactoring call. If you want a concrete starting point, use the live examples or the architecture page to see the data model behind the output. (platform.openai.com)
One last rule for refactoring priorities
If a change will affect a lot of future work, is easy to verify, and has a narrow scope, do it first. If it is large, risky, and far from active work, leave it alone until the signal gets stronger. That rule keeps refactoring tied to value instead of taste.


