Bias from Incentives and Reinforcement

“Never, ever, think about something else when you should be thinking about the power of incentives.”

— Charlie Munger

According to Charlie Munger, there are only a few forces more powerful than incentives. In his speech “The Psychology of Human Misjudgment,” he reflects on how the power of incentives never disappoints him:

Well, I think I’ve been in the top 5% of my age cohort all my life in understanding the power of incentives, and all my life I’ve underestimated it. And never a year passes but I get some surprise that pushes my limit a little farther.

Sometimes the solution to a behavior problem is simply to revisit incentives and make sure they align with the desired goal. Munger talks about Federal Express, which is one of his favorite examples of the power of incentives:

The heart and soul of the integrity of the system is that all the packages have to be shifted rapidly in one central location each night. And the system has no integrity if the whole shift can’t be done fast. And Federal Express had one hell of a time getting the thing to work.
And they tried moral suasion, they tried everything in the world, and finally somebody got the happy thought that they were paying the night shift by the hour, and that maybe if they paid them by the shift, the system would work better. And lo and behold, that solution worked.

If you’re trying to change a behavior, reason will take you only so far. Reflecting on another example where misaligned incentives hampered the sales of a superior product, Munger said:

Early in the history of Xerox, Joe Wilson, who was then in the government, had to go back to Xerox because he couldn’t understand how their better, new machine was selling so poorly in relation to their older and inferior machine. Of course when he got there, he found out that the commission arrangement with the salesmen gave a tremendous incentive to the inferior machine.

Ignoring incentives almost never works out well. Thinking about the incentives of others is necessary to create win-win relationships.

We can turn to psychology to obtain a more structured and thorough understanding of how incentives shape our actions.

The Science of Reinforcement

The science of reinforcement was furthered by Burrhus Frederic Skinner (usually called B.F. Skinner), a professor of psychology at Harvard from 1958 to 1974.

Skinner, unlike his contemporaries, refused to hypothesize about what happened on the inside (what people or animals thought and felt) and preferred to focus on what we can observe. To him, focusing on how much people ate meant more than focusing on subjective measures, like how hungry people were or how much pleasure they got from eating. He wanted to find out how environmental variables affected behavior, and he believed that behavior is shaped by its consequences.

If we don’t like the consequences of an action we’ve taken, we’re less likely to do it again; if we do like the consequences, we’re more likely to do it again. That assumption is the basis of operant conditioning, “a type of learning in which the strength of a behavior is modified by [its] consequences, such as reward or punishment.” 1

One of Skinner’s most important inventions was the operant conditioning chamber, also known as a “Skinner box,” which was used to study the effects of reinforcers on lab animals. The rats in the box had to figure out how to do a task (such as pushing a lever) that would reward them with food. Such an automated system allowed Skinner and thousands of successors to study conditioned behavior in a controlled setting.

What years of studies on reinforcement have revealed is that consistency and timing play important roles in shaping new behaviors. Psychologists argue that the best way for us to learn complex behaviors is via continuous reinforcement, in which the desired behavior is reinforced every time it’s performed.

If you want to teach your dog a new trick, for example, it is smart to reward him for every correct response. At the very beginning of the learning curve, your failure to immediately respond to a positive behavior might be misinterpreted as a sign of incorrect behavior from the dog’s perspective.

Intermittent reinforcement is reinforcement that is given only some of the times that the desired behavior occurs, and it can be done according to various schedules, some predictable and some not (see “Scheduling Reinforcement,” below). Intermittent reinforcement is argued to be the most efficient way to maintain an already learnt behavior. This is due to three reasons.

First, rewarding the behavior takes time away from the behavior’s continuation. Paying a worker after each piece is assembled on the assembly line simply does not make sense.

Second, intermittent reinforcement is better from an economic perspective. Not only is it cheaper not to reward every instance of a desired behavior, but by making the rewards unpredictable, you trigger excitement and thus get an increase in response without increasing the amount of reinforcement. Intermittent reinforcement is how casinos work; they want people to gamble, but they can’t afford to have people win large amounts very often.

Finally, intermittent reinforcement can induce resistance to extinction (stopping the behavior when reinforcement is removed). Consider the example of resistance outlined in the textbook Psychology: Core Concepts:

Imagine two gamblers and two slot machines. One machine inexplicably pays off on every trial and another, a more usual machine, pays on an unpredictable, intermittent schedule. Now, suppose that both devices suddenly stop paying. Which gambler will catch on first?

Most of us would probably guess it right:

The one who has been rewarded for each pull of the lever (continuous reinforcement) will quickly notice the change, while the gambler who has won only occasionally (on partial reinforcement) may continue playing unrewarded for a long time.

Scheduling Reinforcement

Intermittent reinforcement can be used on various schedules, each with its own degree of effectiveness and situations to which it can be appropriately applied. Ratio schedules are based on the number of responses (the amount of work done), whereas interval schedules are based on the amount of time spent.

  • Fixed-ratio schedules are used when you pay your employees based on the amount of work they do. Fixed-ratio schedules are common in freelancing, where contractors are paid on a piecework basis. Managers like fixed-ratio schedules because the response to reinforcement is usually very high (if you want to get paid, you do the work).
  • Variable-ratio schedules are unpredictable because the number of responses between reinforcers varies. Telemarketers, salespeople, and slot machine players are on this schedule because they never know when the next sale or the next big win will occur. Skinner himself demonstrated the power of this schedule by showing that a hungry pigeon would peck a disk 12,000 times an hour while being rewarded on average for only every 110 pecks. Unsurprisingly, this is the type of reinforcement that normally produces more responses than any other schedule. (Varying the intervals between reinforcers is another way of making reinforcement unpredictable, but if you want people to feel appreciated, this kind of schedule is probably not the one to use.)
  • Fixed-interval schedules are the most common type of payment — they reward people for the time spent on a specific task. You might have already guessed that the response rate on this schedule is very low. Even a rat in a Skinner box programmed for a fixed-interval schedule learns that lever presses beyond the required minimum are just a waste of energy. Ironically, the “9-5 job” is a preferred way to reward employees in business.

While the design of scheduling can be a powerful technique for continuing or amplifying a specific behavior, we may still fail to recognize an important aspect of reinforcement — individual preferences for specific rewards.

Experience suggests that survival is propelled by our need for food and water. However, most of us don’t live in conditions of extreme scarcity and thus the types of reinforcement appealing to us will differ.

Culture plays an important role in determining effective reinforcers. And what’s reinforced shapes culture. Offering tickets to a cricket match might serve as a powerful reward for someone in a country where cricket is a big deal, but would be meaningless to most Americans. Similarly, an air-conditioned office might be a powerful incentive for employees in Indonesia, but won’t matter as much to employees in a more temperate area.

What About Punishment?

So far we’ve talked about positive reinforcement — the carrot, if you will. However, there is also a stick.

There is no doubt that our society relies heavily on threat and punishment as a way to keep ourselves in line. Still, we keep arriving late, forgetting birthdays, and receiving parking fines, even though we know there is the potential to be punished.

There are several reasons that punishment might not be the best way to alter someone’s behavior.

First of all, Skinner observed that the power of punishment to suppress behavior usually disappears when the threat of punishment is removed. Indeed, we all refrain from using social networks during work hours, when we know our boss is around, and we similarly adhere to the speed limit when we know we are being watched by a police patrol.

Second, punishment often triggers a fight-or-flight response and renders us aggressive. When punished, we seek to flee from further punishment, and when the escape is blocked, we may become aggressive. This punishment-aggression link may also explain why abusing parents come from abusing families themselves.

Third, punishment inhibits the ability to learn new and better responses. Punishment leads to a variety of responses — such as escape, aggression, and learned helplessness — none of which aid in the subject’s learning process. Punishment also fails to show subjects what exactly they must do and instead focuses on what not to do. This is why environments that forgive failure are so important in the learning process.

Finally, punishment is often applied unequally. We are ruled by bias in our assessment of who deserves to be punished. We scold boys more often than girls, physically punish grade-schoolers more often than adults, and control members of racial minorities more often (and more harshly) than whites.

What Should I Do Instead?

There are three alternatives that you can try the next time you feel tempted to punish someone.

The first we already touched upon — extinction. A response will usually diminish or disappear if it ceases to produce the rewards it once did. However, it is important that all possible reinforcements are withheld. This is far more difficult to do in real life than in a lab setting.

What makes it especially difficult is that during the extinction process, organisms tend to look for novel techniques to obtain reinforcement. This means that a whining child will either redouble her efforts or change tactics to regain the parent’s attention before ceasing the behavior. In this case, a better extinction strategy is to combine methods by withholding attention after whining occurs and rewarding more desirable behaviors with attention before the whining occurs.

The second alternative is positively reinforcing preferred activities. For example, people who exercise regularly (and enjoy it) might use a daily run as a reward for getting other tasks done. Similarly, young children learn to sit still by being rewarded with occasional permission to run around and make noise. The main principle of this idea is that a preferred activity, such as running around, can be used to reinforce a less preferred activity. This idea is also called the Premack principle.

Finally, prompting and shaping are two actions we can use together to change behavior in an iterative manner. A prompt is a cue or stimulus that encourages the desired behavior. When shaping begins, any approximation of the target response is reinforced. Once you see the approximation occurring regularly, you can make the criterion for the target more strict (the actual behavior has to match the desired behavior more closely), and you continue narrowing the criteria until the specific target behavior is performed. This tactic is often the preferred method of developing a habit gradually and of training animals to perform a specific behavior.

***

I hope that you are now better equipped to recognize incentives as powerful forces shaping the way we and others behave. The next time you wish someone would change the way they behave, think about changing their incentives.

Like any parent, I experiment with my kids all the time. One of the most effective things I do when one of them has misbehaved is to acknowledge my child’s feelings and ask him what he was trying to achieve.

When one kid hits the other, for example, I ask him what he was trying to accomplish. Usually, the response is “He hit me. (So I hit him back.)” I know this touches on an automatic human response that many adults can’t control. Which makes me wonder how I can change my kids’ behavior to be more effective.

“So, you were angry and you wanted him to know?”

“Yes.”

“People are not for hitting. If you want, I’ll help you go tell him why you’re angry.”

Tensions dissipate. And I’m (hopefully) starting to get my kids thinking about effective and ineffective ways to achieve their goals.

Punishment works best to prevent actions whereas incentives work best to encourage them.

Let’s end with an excellent piece of advice that has been given regarding incentives. Here is Charlie Munger, speaking at the University South California commencement:

You do not want to be in a perverse incentive system that’s causing you to behave more and more foolishly or worse and worse — incentives are too powerful a control over human cognition or human behavior. If you’re in one [of these systems], I don’t have a solution for you. You’ll have to figure it out for yourself, but it’s a significant problem.

Footnotes