I’m cursed.
For some reason, at every company I’ve ever joined, there was this luminous moment when someone decided to do a rebuild.
Every time such a decision was made, my brain happily reminded me of Joel Spolky’s cautionary words on rebuilds:
“When you throw away code and start from scratch, you are throwing away all that knowledge. All those collected bug fixes. Years of programming work.
You are throwing away your market leadership. You are giving a gift of two or three years to your competitors, and believe me, that is a long time in software years.
You are putting yourself in an extremely dangerous position where you will be shipping an old version of the code for several years, completely unable to make any strategic changes or react to new features that the market demands, because you don’t have shippable code. You might as well just close for business for the duration.
You are wasting an outlandish amount of money writing code that already exists.”
- Joel Spolsky in Things You Should Never Do, Part I
Joel Spolsky argues that programmers mainly want to throw away code because they believe it is a mess.
There are two primary reasons they believe the old code is a mess:
Reading old code is harder and less fun than writing code.
They lack the context and understanding from when the old code was written and judge it with their current (insufficient) perspective and understanding.
I recommend reading the original article because Joel explains it far better than I can.
But let’s put Joel’s words of caution aside for now. Let’s assume you’ve read the article, given it some thought, and still think that rebuilding your product is a good idea.
As the saying goes: fortune favors the brave. And you must be very brave indeed!
How do you proceed? How do you increase your chances of making your product rebuild a success? What should you absolutely not do when rebuilding a product?
As I said before, I’m cursed. I’ve been directly involved in 6 product rebuilds with the scar tissue to prove it. That’s why I believe I’m in an excellent position to answer this question based on experience.
I’ve seen the whole spectrum of rebuild successes and failures. Some rebuilds I was involved in were massive successes raking in millions of dollars. Other rebuilds were total duds, with the plug being pulled after a few months.
While there were many different reasons these rebuilds failed, the most frequent and explosive reason rebuilds failed was by ignoring Gall’s law.
What’s Gall’s Law?
I once worked at a company, where one of the lead developers stated they needed 3 months of not delivering any feature, and not writing any code to come up with the system architecture of a product rebuild. He believed that’s how a good system is built.
You spend sufficient time thinking, and by thinking long and hard enough, you can build the perfect complex system. However, this approach completely violates Gall’s law and is guaranteed to result in failure.
Gall’s law states the following:
“A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched to make it work. You have to start over with a simple working system.” - John Gll, Systems Theorist & Pediatrician
Building a complex system is enticing. Developers love challenging themselves by putting their thinking hat on and coming up with a cool and complicated architecture. But is this really how the best systems are built?
If you’ve ever worked on a technical debt monstrosity of a system, the following quote from John Gall will also ring true:
“The system is its own best explanation - it is a law unto itself. They develop internal goals the instant they come into being and these goals come first. Systems don't work for you or me. They work for their own goals and behaves as if it has a will to live." - John Gall
And herein lies the conundrum: the more technical debt, the more scarred and scared developers become. The moment you decide to do a rebuild, they want to get it right. They want to prevent all the mistakes of the past. They want to layer certainty on certainty, to prevent ending up in the same miserable situation they are currently in. They know all the problems and limitations of the existing system and are in prime position to execute the perfect rebuild.
I once worked somewhere where developers proposed the following for a rebuild:
95% unit test coverage
All flows covered by E2E tests
All domains worked out up-front
Everything a micro-service from the start
What happened to this rebuild? It was a failure and it was scratched.
The difficult part is getting the balance right: too much designing ahead of time results in premature optimization and a complex system that doesn’t behave as you’d like. Too little designing ahead also results in a complex system that doesn’t behave as you’d like.
The key insight is this: your system will have to face real-world constraints, but your system will also produce its own constraints you must deal with. And you don’t discover those constraints until the system is working and running.
The more complex the system you are designing, the longer it takes before it will be live, and the more its unintended constraints can suck you in and take on a life of their own.
And that’s why often the following happens: technical debt begets more technical debt. Even when rebuilding from scratch.
We let fear take the lead and the problems of our existing system dictate the new system, therefore ignoring the new problems we’re spawning in the system we’re creating.
What Should We Be Doing Instead?
An underrated superpower: worrying about things at the right moment.
For example, when building a new product, it is easy to worry about everything and get sucked into discussions on all the different options and considerations you are facing. When you try to solve all of them at the same time, you won't. You will become paralyzed from all the stuff you still have to figure out and end up in meeting hell.
Another problem is that you'll be making all your decisions at a point in time when you have the worst understanding and the least information. What are the things we have to solve now and what are the things we can leave for our future selves to worry about?
Postponing may seem like you're passing on the hot potato, but you are actually delaying the decision to a moment where you will have more knowledge and a better understanding.
Plus you simply can't solve everything at once.
Solutions that gradually emerge end up being better than solutions that try to aim for a hole-in-one — and fail.
Premature optimization is the root of all evil. Emergent design is the enemy of premature optimization. The key thing is leaving your options open in a way that will not limit your options to optimize toward a great solution.
That’s what we should be thinking about.
The dream of a perfect architecture is the enemy of a good architecture.
Rebuilds are sometimes necessary when the technology used to build the application has been replaced by a newer technology. I was Scrum Master of a team that was rewriting a B2B web application originally written in AngularJS which was replaced by Angular. For two very good reasons, it was decided to rewrite the application in Angular. First, Angular is a better language, based on TypeScript. Second, over time, it would have been increasingly difficult to find AngularJS developers to support the application.
The rewrite was 100% successful and included some better design elements. Over time, the new application also had 100% coverage of automated tests for what could be automated.
This really spoke to me and my experience. I am looking forward to Part 2!