🔗 The Circle of Failure

First published: 2022-02-19.

I used to work as a software engineer for a telecommunications company with millions of customers. For the most part, this was not an IT job, but rather an engineering job. My team was writing code that handled the entire telecommunications infrastructure of the company across the country, from layer 1 and above.

One of the company's largest, busiest and most business-sensitive lower-level components came with both an API and a GUI. Both were truly terrible, and had many issues, although the component did excel in some ways. It was 12 years ago that I started working on/with this component, and the insights I gained from both its good and bad features were considerable, and despite working on many, many systems in my life, this was one of those I learned the most from.

One of the issues with the component was its configuration. It was a large, complicated component with many configuration options. By now I'm a bit fuzzy on the details, but one day we've had to make a change of configuration. We wanted to enable a certain option, let's say "Option A". Attempting to do so through the built-in GUI (and the API) resulted in an error saying "Option A cannot be enabled unless Option B is disabled." Sounded easy enough, and we didn't really care about Option B, so we tried to disable it, only to receive this error: "Option B cannot be disabled unless Option A is enabled."

After a few minutes of confusion, retries and disbelief, I started laughing loudly, which made a colleague of mine come to see what all the fuss was about. "Take a look at this," I told him. I showed him the issue, we laughed, and between chokes of laughter I spontaneously named it "the Circle of Failure!"

The Circle of Failure was funny, on the one hand, but extremely frustrating on the other. First, because we had to contact the manufacturer for support, which practically always resulted in them sending out a fix. Second, because fixes are never immediate, so we had to either find a workaround or live with diminished business. And third, because of the implications, which by now became clear after working with the product for a while: this extremely business-sensitive product was very badly tested. It takes a very special kind of incompetence to create such a fatal circular dependency.

Since then, I have used this spontaneously created term many times, and it became kind of a pet peeve of mine. I keep spotting the Circle of Failure in many different places. One which I particularly "like" is the "Police-Municipality Circle", where calling the police to report an issue results in them telling you that it's a municipality issue and to call them instead. So you call the municipality, who tells you it's a police issue and to call them. You can explain to both sides however many times you want that you've already spoken with the other one, but it's not going to help. I secretly (well, openly) believe that these kinds of Circles of Failure are intentional and specifically meant to avoid providing needed services to the population, for whatever reason.

You can find the Circle of Failure even in individual human behavior. As "Fat Bastard" — played by Mike Myers in the Austin Powers film series — says: "I can't stop eating. I eat because I'm unhappy, and I'm unhappy because I eat. It's a vicious cycle."

My most favorite instances of the Circle of Failure, though, are in software. One of my business' customers allows remote connection to their internal network for work purposes. To do this, they purchased a security system that only works on Microsoft Windows and still requires Internet Explorer. If you want to connect remotely, you must be using Windows and IE. The reason for this is that only on Windows with Internet Explorer are they able to install software on user machines with little to no user interaction. You just open IE, enter the URL for the remote connection page, and a third-party software magically installs itself on your computer. The purpose of this third-party software is to verify that your computer conforms with the company's security policy, or in other words, that your computer is secure enough to be allowed to connect to the company's internal networks. It gets full administrator privileges despite not actually asking the user for permission. But here's the thing: if you were able to install what can only be described as malware on my computer without my consent, then by definition my computer is not secure. So, verifying the security of my computer requires breaking the security of my computer. Great success!

I keep the Circle of Failure in mind whenever I design new software. The idea is well known: divide and conquer. Break things down to small, independent units that are later composed into one cohesive system. For each unit, adopt the Unix philosophy of doing only one thing, and doing it well. These concepts are not difficult to grasp, but are also not easy to implement, especially in the world of startups, where it's always a "race", there's no time for actual planning, and requirements change on an hourly basis. Code-bases are often not even broken down into units correctly. A single project often spans several repositories where the boundaries separating them are extremely unclear. That's why circular dependencies happen so frequently in software projects.

So keep the Circle of Failure in mind. Write business logic units that are separate from the facilities used to compose them with the rest of the system (e.g write your APIs as generic libraries, expose them as HTTP APIs via a separate facility). Use a monorepo during initial development until the natural way of breaking out different modules/components becomes apparent or unavoidable. Learn to use Git's sparse-checkout feature. And use this rule of thumb to verify you've broken your code into units correctly: if you did it correctly, writing unit tests will be easy. If figuring out how to write unit tests is bending your mind, you probably did it wrong.