Margin of Safety: An Introduction to the Mental Model

Previously on Farnam Street, we covered the idea of Redundancy — a central concept in both the world of engineering and in practical life. Today we’re going to explore a related concept: Margin of Safety.

The margin of safety is another concept rooted in engineering and quality control. Let’s start there, then see where else our model might apply in practical life, and lastly, where it might have limitations.

* * *

Consider a highly-engineered jet engine part. If the part were to fail, the engine would also fail, perhaps at the worst possible moment—while in flight with passengers on board. Like most jet engine parts, let us assume the part is replaceable over time—though we don’t want to replace it too often (creating prohibitively high costs), we don’t expect it to last the lifetime of the engine. We design the part for 10,000 hours of average flying time.

That brings us to a central question: After how many hours of service do we replace this critical part? The easily available answer might be 9,999 hours. Why replace it any sooner than we have to? Wouldn’t that be a waste of money?

The first problem is, we know nothing of the composition of the 10,000 hours any individual part has gone through. Were they 10,000 particularly tough hours, filled with turbulent skies? Was it all relatively smooth sailing? Somewhere in the middle?

Just as importantly, how confident are we that the part will really last the full 10,000 hours? What if it had a slight flaw during manufacturing? What if we made an assumption about its reliability that was not conservative enough? What if the material degraded in bad weather to a degree we didn’t foresee?

The challenge is clear, and the implication obvious: we do not wait until the part has been in service for 9,999 hours. Perhaps at 7,000 hours, we seriously consider replacing the part, and we put a hard stop at 7,500 hours.

The difference between waiting until the last minute and replacing it comfortably early gives us a margin of safety. The sooner we replace the part, the more safety we have—by not pushing the boundaries, we leave ourselves a cushion. (Ever notice how your gas tank indicator goes on long before you’re really on empty? It’s the same idea.)

The principle is essential in bridge building. Let’s say we calculate that, on an average day, a proposed bridge will be required to support 5,000 tons at any one time. Do we build the structure to withstand 5,001 tons? I’m not interested in driving on that bridge. What if we get a day with much heavier traffic than usual? What if our calculations and estimates are a little off? What if the material weakens over time at a rate faster than we imagined? To account for these, we build the bridge to support 20,000 tons. Only now do we have a margin of safety.

This fundamental engineering principle is useful in many practical areas of life, even for non-engineers. Let’s look at one we all face.

* * *

Take a couple earning $100,000 per year after taxes, or about $8,300 per month. In designing their life, they must necessarily decide what standard of living to enjoy. (The part which can be quantified, anyway.) What sort of monthly expenses should they allow themselves to accumulate?

One all-too-familiar approach is to build in monthly expenses approaching $8,000. A $4,000 mortgage, $1,000 worth of car payments, $1,000/month for private schools…and so on. The couple rationalizes that they have “earned” the right live large.

However, what if there are some massive unexpected expenditures thrown their way? (In the way life often does.) What if one of them lost their job and their combined monthly income dropped to $4,000?

The couple must ask themselves whether the ensuing misery is worth the lavish spending. If they kept up their $8,000/month spending habit after a loss of income, they would have to choose between two difficult paths: Rapidly eating into their savings or considerably downsizing their life. Either is likely to cause extreme misery from the loss of long-held luxuries.

Thinking in reverse, how can we avoid the potential misery?

A common refrain is to tell the couple to make sure they’ve stashed away some money in case of emergency, to provide a buffer. Often there is a specific multiple of current spending we’re told to have in reserve—perhaps 6-12 months. In this case, savings of $48,000-$96,000 should suffice.

However, is there a way we can build them a much larger margin for error?

Let’s say the couple decides instead to permanently limit their monthly spending to $4,000 by owning a smaller house, driving less expensive cars, and trusting their public schools. What happens?

Our margin of safety now compounds. Obviously, a savings rate exceeding 50% will rapidly accumulate in their favor — $4,300 put away by the first month, $8,600 by the second month, and so on. The mere act of systematically underspending their income rapidly gives them a cushion without much trying. If an unexpected expenditure comes up, they’ll almost certainly be ready.

The unseen benefit and the extra margin of safety in this choice comes if either spouse loses their income – either by choice (perhaps to care for a child) or by bad luck (health issues). In this case, not only has a high savings rate accumulated in their favor, but because their spending is systematically low, they can avoid tapping it altogether! Their savings simply stop growing temporarily while they live on one income. This sort of “belt and suspenders” solution is the essence of margin-of-safety thinking.

(On a side note: Let’s take it even one step further. Say their former $8,000 monthly spending rate meant they probably could not retire until age 70, given their current savings rate, investment choices, and desired lifestyle post-retirement. Reducing their needs to $4,000 not only provides them much needed savings, quickly accelerating their retirement date, but they now need even less to retire on in the first place. Retiring at 70 can start to look like retiring at 45 in a hurry.)

* * *

Clearly, the margin of safety model is very powerful, and we’re wise to use it whenever possible to avoid failure. But it has limitations.

One obvious issue, most salient in the engineering world, comes in the tradeoff with time and money. Given an unlimited runway of time and the most expensive materials known to mankind, it’s likely that we could “fail-proof” many products to such a ridiculous degree as to be impractical in the modern world.

For example, it’s possible to imagine Boeing designing a plane that would have a fail rate indistinguishable from zero, with parts being replaced 10% into their useful lives, built with rare but super-strong materials, etc.—so long as the world was willing to pay $25,000 for a coach seat from Boston to Chicago. Given the impracticability of that scenario, our tradeoff has been to accept planes that are not “fail-proof,” but merely extremely unlikely to fail, to give the world safe enough air travel at an affordable cost. This tradeoff has been enormously wise and helpful to the world. Simply put, the margin-of-safety idea can be pushed into farce without careful judgment.

* * *

This brings us to another limitation of the model, which is the failure to engage in “total systems” thinking.

“The reliability that matters is not the simple reliability of one component of a system,
but the final reliability of the total control system.”
— Garrett Hardin in Filters Against Folly

Let’s return to the Boeing analogy. Say we did design the safest and most reliable jet airplane imaginable, with parts that would not fail in one billion hours of flight time under the most difficult weather conditions imaginable on Earth—and then let it be piloted by a drug addict high on painkillers.

The problem is that the whole flight system includes much more than just the reliability of the plane itself. Just because we built-in safety margins in one area does not mean the system will not fail. This illustrates not so much a failure of the model itself, but a common mistake in the way the model is applied.

* * *

This brings us to a final issue with the margin of safety model—naïve extrapolation of past data. Let’s look at a common insurance scenario to illustrate this one.

Suppose we have a 100-year-old reinsurance company – PropCo – which reinsures major primary insurers in the event of property damage in California caused by a catastrophe – most worrying being an earthquake and its aftershocks. Throughout its entire (long) history, PropCo had never experienced a yearly loss on this sort of coverage worse than $1 billion. Most years saw no loss worse than $250 million, and in fact, many years had no losses at all – giving them comfortable profit margins.

Thinking like engineers, the directors of PropCo insisted that the company have such a strong financial position so that they could safely cover a loss twice as bad as anything ever encountered. Given their historical losses, the directors believed this extra capital would give PropCo a comfortable “margin of safety” against the worst case. Right?

However, our directors missed a few crucial details. The $1 billion loss, the insurer’s worst, had been incurred in the year 1994 during the Northridge earthquake. Since then, the building density of Californian cities had increased significantly, and due to ongoing budget issues and spreading fraud, strict building codes had not been enforced. Considerable inflation in the period since 1994 also ensured that losses per damaged square foot would be far higher than ever faced previously.

With these conditions present, let’s propose that California is hit with an earthquake reading 7.0 on the Richter scale, with an epicenter 10 miles outside of downtown LA. PropCo faces a bill of $5 billion – not twice as bad, but five times as bad as it had ever faced. In this case, PropCo fails.

This illustration (which recurs every so often in the insurance field) shows the limitation of naïvely assuming a margin of safety is presently based on misleading or incomplete past data.

* * *

The margin of safety is an important component of some decisions and life. You can think of it as a reservoir to absorb errors or poor luck. Size matters. At least, in this case, bigger is better. And if you need a calculator to figure out how much room you have, you’re doing something wrong.

Margin of safety is part of the Farnam Street Latticework of Mental Models.