Farnam Street helps you make better decisions, innovate, and avoid stupidity.

With over 400,000 monthly readers and more than 93,000 subscribers to our popular weekly digest, we've become an online intellectual hub.

The ways that sampling can lead us astray

From Samuel Arbesman’s excellent interview of Michael Mauboussin (on his book The Success Equation)

Arbesman: What are some of the ways that sampling (including undersampling, biased sampling, and more) can lead us quite astray when understanding skill and luck?

Mauboussin: Let’s take a look at undersampling as well as biased sampling. Undersampling failure in business is a classic example. Jerker Denrell, a professor at Warwick Business School, provides a great example in a paper called “Vicarious Learning, Undersampling of Failure, and the Myths of Management.” Imagine a company can select one of two strategies: high risk or low risk. Companies select one or the other and the results show that companies that select the high-risk strategy either succeed wildly or fail. Those that select the low-risk strategy don’t do as well as the successful high-risk companies but also don’t fail. In other words, the high-risk strategy has a large variance in outcomes and the low-risk strategy has smaller variance.

Say a new company comes along and wants to determine which strategy is best. On examination, the high-risk strategy would look great because the companies that chose it and survived had great success while those that chose it and failed are dead, and hence are no longer in the sample. In contrast, since all of the companies that selected the low-risk strategy are still be around, their average performance looks worse. This is the classic case of undersampling failure. The question is: What were the results of all of the companies that selected each strategy?

Now you might think that this is super obvious, and that thoughtful companies or researchers wouldn’t do this. But this problem plagues a lot of business research. Here’s the classic approach to helping businesses: Find companies that have succeeded, determine which attributes they share, and recommend other companies seek those attributes in order to succeed. This is the formula for many bestselling books, including Jim Collins’s Good to Great. One of the attributes of successful companies that Collins found, for instance, is that they are “hedgehogs,” focused on their business. The question is not: Were all successful companies hedgehogs? The question is: Were all hedgehogs successful? The second question undoubtedly yields a different answer than the first.

Another common mistake is drawing conclusions based on samples that are small, which I’ve already mentioned. One example, which I learned from Howard Wainer, relates to school size. Researchers studying primary and secondary education were interested in figuring out how to raise test scores for students. So they did something seemingly very logical – they looked at which schools have the highest test scores. They found that the schools with the highest scores were small, which makes some intuitive sense because of smaller class sizes, etc.

But this falls into a sampling trap. The next question to ask is: which schools have the lowest test scores? The answer: small schools. This is exactly what you would expect from a statistical viewpoint since small samples have large variances. So small schools have the highest and lowest test scores, and large schools have scores closer to the average. Since the researchers only looked a high scores, they missed the point.

This is more than a case for a statistics class. Education reformers proceeded to spend billions of dollars reducing the sizes of schools. One large school in Seattle, for example, was broken into five smaller schools. It turns out that shrinking schools can actually be a problem because it leads to less specialization—for example, fewer advanced placement courses. Wainer calls the relationship between sample size and variance the “most dangerous equation” because it has tripped up some many researchers and decision makers over the years.

Mauboussin also comments on the Red Queen Effect:

… I think the critical distinction is between absolute and relative performance. In field after field, we have seen absolute performance improve. For example, in sports that measure performance using a clock—including swimming, running, and crew—athletes today are much faster than they were in the past and will continue to improve up to the point of human physiological limits. A similar process is happening in business, where the quality and reliability of products has increased steadily over time.

But where there’s competition, it’s not absolute performance we care about but relative performance. This point can be confusing. For example, the analysis shows that baseball has a lot of randomness, which doesn’t seem to square with the fact that hitting a 95-mile-an-hour fastball is one of the hardest things to do in any sport. Naturally, there is tremendous skill in hitting a fastball, just as there is tremendous skill in throwing a fastball. The key is that as pitchers and hitters improve, they improve in rough lockstep, offsetting one another. The absolute improvement is obscured by the relative parity.

This leads to one of the points that I think is most counter to intuition. As skill increases, it tends to become more uniform across the population. Provided that the contribution of luck remains stable, you get a case where increases in skill lead to luck being a bigger contributor to outcomes. That’s the paradox of skill. So it’s closely related to the Red Queen effect.