Inertia and Large Code Bases

Posted on November 17, 2008
Filed Under Computer Science |

Or: Long live the king.

What is “Code Inertia”?

I’ve been interested in “code inertia” for a few years now. It should sound familiar to anyone with a basic understanding of physics: code with a lot of inertia is very resistant to change. You’d either have to be crazy, uninformed, or very well paid to try to shake the way the system is organized.

I met my first gigantic and arcane system at my college internship. It was a Java-based web platform with 8 years of cruft built on top. There was no documentation for even creating a development environment, and it had a complex and lengthy build system. It was a mountain of pain ready to strike. There was no way in hell that the system could be considered “beginner friendly,” and there was no clear path for gaining knowledge of the system.

It was so complex that only a handful of individuals had the Keys to the Kingdom. Everybody else used the tools produced by the Chosen Ones and went on their merry way. I tried to become a Holder of Knowledge, but the system proved too great for me, the college intern.

Why do I mention it? Because the whole system had a gigantic amount of inertia! There was so much code built on top of the mountain, that nothing could be changed. You couldn’t change the build system, because then you’d have to alter a mountain of configurations for hundreds of sites. Despite the amount of code, very little of it was actually reusable, and there was no central repository of knowledge for code reuse.

The most helpful changes became the ones that were the most impossible to make. It was a big, suffocating force, and the company eventually abandoned it for a shiny new toy called Sharepoint. The old way of doing things simply wasn’t sustainable. Whether they learned their lesson or not I’ll never know, because I have since left the company and landed a real programming job.

Because of the monstrosity I mentioned, I appreciate my current job’s gigantic, fairly documented, and accessible code base quite a bit. It’s not close to perfect, but it does a lot of real world tasks very well, and beginners can use a good chunk of it pretty quickly.

Windows.h

If you want THE BEST example of inertia, just look at “windows.h“. It might have the greatest code inertia of any file not defined in a language standard. It is the basis for every Win32 application. Could it be modified? Potentially, but only by appending to the standard. The file itself could never be modified without wasting an impressive number of programmer hours.

This is unfortunate, because it has classic design errors. Windows programming still isn’t very “easy” with it, and you have to watch #include order and know what macros it defines. Yuck. However, none of these things can really be fixed, because even innocent-seeming changes would probably break builds for arcane reasons. Obviously, it should be replaced.

The .NET framework is the first Microsoft offering that has the potential to deliver a knockout blow to the old way of Win32 programming, since MFC wasn’t up to the task. Some people reject .NET for the runtime size, and others reject it because it’s Microsoft-specific, but it’s a lot easier to work with than Win32 for your run-of-the-mill windowed application.

Reasons for Inertia

Code can pick up inertia for different reasons:

So many files depend on this code that changes would be prohibitively expensive.

This is (in my personal experience) the most common kind of code inertia, as it’s hard to avoid even for well-designed systems. You design a few nice classes, build useful utilities around them, and use the library in 162 programs.

You could easily change the low-level code underneath the utility abstractions. Unfortunately, the top-level interface has picked up quite a bit of inertia. If you want your whole system to be consistent, you’ll need to make quite a few changes. Otherwise, you can do sneaky things that are good enough for most situations, like deprecate the interface while maintaining a backwards-compatibility layer.

The code itself is so large that changes are prohibitively expensive.

This is related to the previous point, but sans dependences. That finance application you were writing for that coal business? Well, they suddenly decided that they wanted to enter the asphalt business, and want the application to reflect that.

The code is so complex and ill-written that changing it is likely to break everything.

This is likely to have been written “by somebody else.” Everybody tacks their changes on to the system instead of modifying how it works, and hope it falls apart when the next guy uses it.

That guy who knew how everything works quit.

“Fricksticks, Frank’s gone. Now nobody knows how the the whole Foo suite works. Frick frick frick.”

Predictive Power of Inertia

Let’s say that our company has two million lines of code that works wonders with widgets, and our CEO suddenly sees a golden opportunity in the gadget market.

We have a big problem: our code wasn’t designed with gadgets in mind! You might find some core libraries are abstracted enough to work with gadgets. The rest of it explicitly assumes widgets.

What’s going to happen? As nice as refactoring the entire code base would be– and I think I threw up a little inside thinking about it– it’s obvious that gadget code is going to be added on top of the widget code base, instead of working with it. That’s a prediction that anybody can make.

Adding the basic gadget functionality you need NOW will be much less effort than changing your code to handle widgets, gadgets, gizmos, and pandas. It’s the right way to go from a business perspective (less time, less expense, same results), but it’s unsatisfying from a technical perspective, and probably misses some nice opportunities for generalization.

Some people will point out that there are lost opportunities for the future. You may want to also be able to work with thingamajigs and whatchamacallits in the future. If you tack on gadget code, you’re going to be tacking on code for everything else.

It’s important to realize that there’s never a free lunch in any of this regard. Even if all of your library code has perfect abstractions, you’ll still need to build up some extra gadget-specific code that reads and writes them, interfaces with your existing utilities, etc. It’s all just a matter of ease of modification.

Law of Large Code Bases

All code bases cross a size threshold where addition is less costly than modification.

When you need new functionality in your code, you have two options:

  1. Modify the existing code to solve your problem.
  2. Creating new code to solve your problem.

Sometimes, Option 1 is unacceptable, and it’s unfortunate. In a perfect world, we would write code once, and it would work for all of our situations. In a less perfect world, we should always be able to modify our existing code to best be generalized. Unfortunately, software is a business, and rewriting/heavy modification isn’t always in the cards.

Option two is a short term solution, but an interesting corollary becomes immediately obvious:

When a code base has crossed the threshold mentioned in the Law of Large Code Bases, adding additional adds additional inertia to the code.

When code crosses this boundary, you can consider it a marked man until it is changed for the general case.

Can We Plan Around Code Inertia?

The planning phase of the project is obviously the best chance to take code inertia into account. Unfortunately, there’s not usually a whole lot that you can do. You’re not always under control of the scope of your project, and you don’t know what problems you’re going to need to solve in 2 years. That’s fine.

Trying to find flexible solutions is a great way to start. Just take a look at Steve Yegge’s post on Property Lists as a programming paradigm. You can express a tremendous number of solutions in terms of
property lists, and they’re very flexible if you use them right.

Another successful strategy is the One Thing Well philosophy. Unix systems come with a broad variety of tools that “do one thing, but do it well.” Chaining the tools together is easy, and any system programming problem can easily be solved in terms of these tools.

In writing this post, I noticed that deprecation/change seems to be the most successful way of supplanting old methodologies. Python 3000 is a great example of this. So is the module system of the Linux 2.6 kernel. It turns out that it’s really easy to change code that’s underneath a layer of abstraction, but you can’t change code that relies on the abstraction.

Popularity: 9% [?]

Comments

Leave a Reply