Should we refactor files we’re working on?

When programming, always follow the camping rule: Always leave the code base healthier than when you found it.

- Martin Fowler, Refactoring

The Boy Scouts of America have a simple rule that we can apply to our profession. Leave the campground cleaner than you found it.

- Robert Martin, Clean Code

Many of us share the attitude expressed by the above Fowler and Martin quotes. The attitude presumes that code we’re working on now will change again soon, and we’ll reap the benefits of a refactor when that happens.

Here’s another common attitude: we don’t get enough time to refactor.

These attitudes are related: insofar as we hope to make future work easier, the above camping rule can lead to sub-optimal decisions about what code gets refactored. Those sub-optimal decisions can make us slower at building software than we could be, which means — among other things — that we have less time to refactor.

How often do these sub-optimal decisions happen? The answer depends on how close the link is between current edits and future ones. I’ve already claimed that the link here is tenuous and suggested an alternative way of thinking about what code we should refactor, but here I measure the link between diffs (diff = number of lines changed) at time t and diffs at time t + 1. In other words, this post is about the extent to which diffs are serially correlated across months and quarters.

I’m only going to look at one repo here: the react.js repo. It’s the first one I chose and it happened to be one where diffs aren’t strongly serially correlated. The median of the 1-month autocorrelation for all source file diffs in the react.js repo is -.098. This means that if you're editing a random file in the react repo and you refactor it hoping that someone will benefit from it in a month, you're making a bad bet. If the react repo is representative of the codebases we tend to work in, we should rethink how we decide what code gets refactored.

In the first section, I breakdown the results I found while analyzing the react repo. In the next section, I go over the source code used for this analysis so you can make sure I’m not completely full of shit and so that you can run it on your own codebase and see how well the camping rule is serving you. I end with some caveats on the implications of this analysis and some suggestions for further questions we might ask.


packages/react-devtools-core/src/backend.js : 0 0 0 284 2 0 0 0 16 0 0 0 packages/react-devtools-core/src/editor.js : 0 0 0 190 0 0 0 0 0 0 4 0 packages/react-dom/src/client/ReactDOMHostConfig.js : 0 884 177 43 24 34 68 30 15 53 279 405

The first line of this list says that

  • in the first three months, backend.js had 0 line edits
  • in the fourth month, there were 284 line edits
  • in month 5 there were 2 line edits
  • in the last month, there were 0 line edits

The 1-month-lagged autocorrelation for this file would be low because the number of lines changed in 1 month doesn’t tell you much about how many lines of code will be changed in the following month in that file.

The story is similar when I group diffs by quarter instead of month. The same files would like like this:

packages/react-devtools-core/src/backend.js : 0 286 16 0 packages/react-devtools-core/src/editor.js : 0 190 0 4 packages/react-dom/src/client/ReactDOMHostConfig.js : 1061 101 113 737

Note that I’m filtering out things like json files, jest snapshot files, and index.js files.

The results are presented as box and whisker plots of these auto-correlations over 1686 files in the react repo. This enables us to easily see the medians of the correlations across all files and the spread of the auto-correlations for all files, and it’ll help us avoid being mislead by outliers.

The histograms are plotted with the hist R function with it's default parameters.

Monthly serial correlations

Image for post
Image for post

Two things I want to highlight here.

First, the median of the auto correlations across all files is only -.0984. This means that 50 percent of the autocorrelations for files are at this value or less. Again, this shows that refactoring a random file in hopes that someone will come along next month to benefit from your work is not a good bet.

Those of us who lack strong intuitions about what good correlation coefficients look like (I’m one of them, to be honest) can get a rough idea of what this number conveys by just looking at the time series data for a bunch of random files:

fixtures/fiber-debugger/src/describeFibers.js : 0 111 0 0 0 0 0 0 0 0 0 0 src/devtools/views/Profiler/CommitFilterModalContext.js : 0 35 0 0 0 0 0 0 0 0 0 0 src/devtools/views/Components/Element.js : 118 233 0 0 0 0 0 0 0 0 0 0 scripts/eslint-rules/__tests__/no-production-logging-test.internal.js : 0 0 0 0 0 0 234 111 0 0 0 0 packages/shared/ReactTypes.js : 0 114 90 20 7 10 5 12 19 3 64 7

Note that while there are some files that see a lot of changes consistently, most files just get changed and aren’t touched for quite a few months. Of course, this means the serial correlation for that file isn’t very strong.

Here’s the second thing I want to highlight: there’s a lot of outliers here (558). So, there’s a lot of variability in the autocorrelation of diffs across files. This variability challenges the idea that we’re making optimal bets by always refactoring files we touch.

Quarterly serial correlations

Image for post
Image for post

Here’s the quarterly serial correlations. 117 outliers this time. 75% of the files have a correlation value of -.083 or less.

The motivation for considering both monthly and quarterly serial correlations was to address the objection that monthly time slices for diffs were too thin to appreciate the relationship between diffs at different times. As you can see, the correlation gets a little better, but not by much.

If quarterly is still too thin a time-slice, this still implies we need to rethink our refactoring choices. If it usually takes over 3 months to reap the benefits from a refactor, then we have to weigh our decision to refactor against more immediate gains we may realize by doing something else, like refactoring a file that will change more frequently or release a feature more quickly.

Methods and Source

Bash + Git

If you run this on the react repo, for example, you wind up with output that looks like this:

Month Lines-Added Lines-Deleted Path May-2019:June-2019 7 6 shells/dev/src/devtools.js May-2019:June-2019 12 6 shells/utils.js May-2019:June-2019 116 0 src/__tests__/__snapshots__/inspectedElementContext-test.js.snap ... June-2019:July-2019 7 0 packages/create-subscription/npm/index.js June-2019:July-2019 23 0 packages/create-subscription/package.json June-2019:July-2019 489 0 packages/create-subscription/src/__tests__/createSubscription-test.internal.js ... Dec-2019:Jan-2020 2 2 packages/react-interactions/events/src/dom/ContextMenu.js Dec-2019:Jan-2020 2 2 packages/react-interactions/events/src/dom/Drag.js Dec-2019:Jan-2020 152 69 packages/react-interactions/events/src/dom/Focus.js ... April-2020:May-2020 1 0 scripts/rollup/validate/eslintrc.rn.js April-2020:May-2020 1 0 scripts/rollup/validate/eslintrc.umd.js April-2020:May-2020 19 19 scripts/shared/inlinedHostConfigs.js

R + Jupyter

Next, we build a map from files to a monthly time-series of diffs. Here’s the output:

packages/react-devtools-core/src/backend.js : 0 0 0 284 2 0 0 0 16 0 0 0 packages/react-devtools-core/src/editor.js : 0 0 0 190 0 0 0 0 0 0 4 0 packages/react-dom/src/client/ReactDOMHostConfig.js : 0 884 177 43 24 34 68 30 15 53 279 405 packages/react-dom/src/client/ReactDOMInput.js : 0 426 0 0 0 0 0 13 8 0 16 0 packages/react-dom/src/events/SyntheticWheelEvent.js : 0 43 0 0 0 0 0 0 18 0 0 0

Then, we loop over all the keys in our map and build a matrix of autocorrelation calculations, where the rows are files and the columns are autocorrelations at various lags (up to 6):

auto_correlations <- list() for (k in keys(files_to_time_series)) { diff_time_series <- unlist(files_to_time_series[[k]]) auto_corr <- acf(diff_time_series, lag.max = 1, plot = FALSE)$acf auto_correlations <- list.append(auto_correlations, auto_corr) } acf_matrix <- t(array(unlist(auto_correlations), dim= c(2, length(auto_correlations))))

From here, we can calculate all the stats we covered in the results section with simple calls to apply.

Caveats, Objections, and Future Directions


Refactoring helps us now too

for each desired change, make the change easy (warning: this may be hard), then make the easy change

- Kent Beck (@KentBeck) September 25, 2012

When refactoring is justified solely because it makes current work easier, we should refactor. For example, sometimes refactoring helps us understand the code we’re changing. The above analysis doesn’t challenge the value of this kind of refactoring. In cases where we appeal to the future benefits of refactoring, the above data suggests we should be more cautious.

Libraries vs. Apps and does this result generalize?

Counting diff sizes vs. counting whether a file is edited at all


What about the benefit of refactoring on future reading of files?

Won’t we eventually wind up in an okay codebase if we consistently follow the campground rule?

For example, Disney doesn’t tell their technicians to do preventative maintenance for every ride, churro deep fryer, and AV system they happen to be touching. Instead, they make statistically informed guesses at when those machines will fail and balance the cost of doing preventative maintenance now against everything else they might do to streamline their operations. We should have tools and practices like this as software engineers. I’ve already made some suggestions in this direction, but I more interested in seeing more discussion about how we move in this direction than I am in pushing my particular solution.


  1. See this caveat for why this phrase is italicized.
  2. Easy example: Management isn’t rushing your work. You happen to have plenty of time to finish the ticket you’re currently working on, so you clean things up a bit while you work and commit your code, feeling like a real boy scout coder. Then, for whatever reason, you fall behind while working on the next feature. You realize that you’re working in a part of the codebase that a) changes very frequently and b) implements the most important feature for the business. Unfortunately, you don’t have the time to refactor this code while you work. You bemoan the lack of time that management gives you for refactoring.
  3. The rest of the bash script is here.
  4. You can see the entire notebook here.
  5. I refuse to treat data as a plural noun. It’s weird, and people have been saying that it’s fine for it be singular for decades. See Elements of Style, for example.

Originally published at on May 9, 2020.

Written by

Wannabe philosophy professor turned wannabe tech entrepreneur.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store