Code duplication is fine, KISS the DRY in its b***

6 min readDec 30, 2019

SOLID, KISS, YAGNI, DRY, we all know them. We learn them at the university or when having our code reviewed by our senior colleagues.

But do you sometimes feel that copying a big part of code feels like the right way to do? Nevertheless, you feel a hesitation due to the inner feeling of breaking some fundamental rules in computer science? The good news is that you may copy the code if the context is right!

“I have unified that migration script code…”©

The story begins with my junior colleague receiving a task to code a simple one-time migration script. You know, something like “take stuff from DB X, make some REST call, save result in DB Y”.

After discussing possible solutions, I was pretty sure the workflow was about to go like this:

copy-paste previous script, already implementing batching, parallelization and fault-tolerance
change data structures and transformations
Voila!

But the pull request shocked me! Two extra “if” statements, several new methods, some abstraction on the top of both scripts… I expected a simple copy-pasting and there was a semi-reusable solution with far more complicated code.

“But duplicated code… what about the DRY principle?”

And that is a reasonable argument. However let’s consider the benefits of following the copy-paste approach (easier one):

the solution is delivered quicker

As many programmers tend to forget — we code for business. Business doesn’t care which coding techniques do we use (except DDD approach maybe) as long as we deliver on time.

They start to worry when we estimate features taking weeks get finished, especially given that our design is not capable of rapid changes smaller changes equals simpler code review

smaller changes equals simpler code review

Code reviewer already knows the code parts responsible for parallelization, batching and fault tolerance. He’s already seen it in the previous script. He can concentrate on the data as well as transformations while making the code review process faster.

individual scripts are easier to change

Want to change parallelization or retries logic within one of the scripts? Easy! Need to rerun a previous script because of some data inconsistency and change some transformations? You only touch an unshared part of the code.

The drawback? Code duplication. But code duplication is only bad if you need to change the code consistently due to the business/technical needs.

And there are worse things than code duplication.

Now let’s consider some other “rules of thumb” people share about code duplication and let’s learn from their experience.

Beware the abstraction!

Code duplication is wrong because you might have to change the implementation in several places. Is a single code duplication problematic?

In a book called “Refactoring, Improving the Design of Existing Code” Martin Fowler mentions “The Rule of three” (following Don Roberts).Which more or less means: Refactor after you write something for the third time, not sooner.

I will use the word “abstraction” from now on meaning any refactoring method lowering code duplication (abstract classes, extract method, etc).

There are 3 possible outcomes of the refactoring with abstraction, 2 of which are negative ones:

you never reuse this abstraction — you’ve coded something smart, yet useless
your abstraction makes new business requirements hard to implement as they don’t match together — you’ve lost time as well, you will need to refactor again

Refactoring should make your code easier to modify, reuse and understand. Does your refactoring meet those rules? Always consider if providing the code reusability for a cost of tangling the logic and complicating code’s usage is worthwhile.

Coupling unrelated code: accidental code duplication

There are also cases where a code duplication is accidental. When unrelated parts of business logic tend to have similar code structure. Moving those parts into a shared library results in a code coupling. This blocks the independent evolution of previously autonomous pieces of code.

As Udi Dahan has said in “Beware the Share” chapter of a book titled “97 things every programmer should know” — context is what changes everything. What works fine in one environment might have the opposite effect elsewhere. Following his words about reducing code duplication:

“When applied in the right context, these techniques are valuable. In the wrong context, they increase costs rather than the value”.

When we’re talking about a one-time migration script (developed to run once according to the name), there are no changes planned for the future. Even if you find a bug in the previous implementation — you can simply put a comment in historical ones and implement the fix only in the next one. So in this context code duplication is simply OK.

Code duplication example in the microservice world

The biggest benefit of microservice architecture is said to be the independent development of services. Yet it introduces a different set of problems, like “how to manage reusable code?”.

Let’s take a bootstrapping of new microservice as an example. In our Machine Learning team, we have decided to use Python in microservices, as multiple ML libraries have Python bindings. We have a “microservice-generator” simple, UI-based service to quickly start with new projects. It sets up CI, CD, Github repository and deploys a dummy version of a service on production. As Python hasn’t been used in our company, the service generator didn’t know how to bootstrap a Python project from scratch.

We definitely knew that eventually we will add a Python template to our microservice generator tool, yet we have delayed it. We’ve always started from “generating plain project template”.

After a year we had a list of Python libraries, approaches and techniques we liked. That was a continuous process of running experiments, some of which of course unsuccessful. Creating a Python project template earlier would make us change it several times during that process. There would be no benefit in that, only some extra work to do.

Old projects would still be out of sync with new ones, as you don’t go and rework old projects “just because” the template has changed. Postponing Python project template was the simplest and the most cost-efficient solution.

Principles are meant to be broken

The significance of the great four — SOLID, KISS, YAGNI, DRY — is indisputable. But, those are called principles not by chance.

You don’t have to “bend the rules”. Treat them as guidelines instead and apply them when appropriate. Don’t follow them blindly though.

When we’re young we learn not to run across a street without a zebra crossing. The same way we learn about DRY as junior programmers.

But in adult life, we can decide to run across the road when the context is adequate (your train is leaving soon, no cars ahead ). Treat programming principles the same way.

KISS

I find KISS being the foundation of other rules. If your refactoring makes the code easier to modify, reuse, understand AND in case it is valid in the current context — go for it. Sometimes it simply doesn’t make sense to reduce the code duplication through refactoring.

So next time your colleague asks why you didn’t refactor that duplicated code, keep calm. Say that you have followed “Copied, Inserted, Seems Simpler“ approach.