What Goodhart’s Law Can Teach You About Performance Data

romanshraga“Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Roman Shraga, data scientist at PlaceIQ.

Is there a metric you use to evaluate the effectiveness of something critical to your company’s success? What about a metric used by your company to evaluate you?

If so, it is essential that you understand what could go wrong in the evaluation of performance data. Your job depends on it!

Performance data is the information used to assess the success of something. It’s how you evaluate the effectiveness of an ad campaign, the throughput of an engineering organization or the business attributable to a specific salesperson, for example. Because performance data is directly tied to the key goals of both individuals and organizations, it is a sensitive – and even contentious – topic. It is ripe for obfuscation and abuse.

A critical insight into how to deal with performance data comes from Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.” In other words, when the measure being used by decision-makers to evaluate performance is the same as the target being optimized by those being measured, it is no longer a reliable measure of performance.

The most cited example of this law in effect is the case of nail factories in the Soviet Union. The goal of central planners was to measure performance of the factories, so factory operators were given targets around the number of nails produced. To meet and exceed the targets, factory operators produced millions of tiny, useless nails. When targets were switched to the total weight of nails produced, operators instead produced several enormous, heavy and useless nails.

The above example is absurd, but illustrates the point: When a measure of performance is the same as the target, it can be abused to the point of no longer being useful in measuring the desired outcome.

This happens all the time in the modern world. For example, when CTR is both a measure and a target, ad companies have a perverse incentive to optimize for clicks with absolutely no regard for whom is doing the clicking. An ad campaign for Ferrari with CTR of 15% sounds amazing — unless the majority of people who clicked the ads are teenagers looking at pictures of cool cars.

Similarly, when cases closed is both the measure of performance and target of customer service organizations, employees might choose to close cases without fully investigating and resolving them. When page views are both the measure and the target of news sites and blogs, editors have incentives to post shocking and controversial content to optimize for the target. In the long run, of course, this behavior degrades the quality of the site and the page views measure is no longer a useful indicator of the desired outcome of an engaged user base.

Examples of Goodhart’s Law can be found in every industry and every department of an organization. Fortunately, there are several approaches that can be taken to mitigate its harmful effects.

  1. The first approach is also the most difficult. By thinking deeply about what is being measured and what the constraints are, it is possible to formulate better measurements. A body of knowledge known as the theory of constraints can be used to guide your thought process as you try to come up with a better measure. For example, as an alternative to relying on cases closed as a measure of customer service, a company can learn from Zappos and strive to quantify and reward good experiences as reported by customers. Still, it must be said that there is debate about whether it is even possible to find a single measure that is immune to the effects of Goodhart’s Law.
  2. A second approach could be to create a “balanced scorecard” of several different measures instead of relying on one. With this strategy, you reduce the risk of a single measure being gamed by looking at multiple measures that evaluate performance from different angles. For example, CTR can be supplemented with a measure of traffic quality, such as bounce rate or conversion rate. When you add multiple measures to your overall performance evaluation, you not only reduce the opportunity for abuse, but you begin to get a more nuanced understanding of the inherent tradeoffs being made. This is similar to the dual metrics of precision and recall used in machine learning classification problems. Together they measure how often the machine gets the right answer and what proportion of the total right answers the machine is able to get.
  3. A third way to mitigate the effects of Goodhart’s Law is to simply use human discretion. This means poking and prodding a reported performance measure until you develop a true understanding about what it is actually indicating. You need to ask questions that ensure the measure relate to the ultimate goal. Additionally, think about whether it would be possible to get a perfect score on the measure, and if it would be possible, to do so without adding any value. This line of reasoning will allow you dissect a measure until you understand whether or not it is doing a good job of indicating performance.

In the end, a mix of all three approaches to mitigation is the most judicious thing to do. You should strive to create the best possible measures that look at performance from multiple angles while always maintaining skepticism and inquiry.

Follow PlaceIQ (@PlaceIQ) and AdExchanger (@adexchanger) on Twitter.

Enjoying this content?

Sign up to be an AdExchanger Member today and get unlimited access to articles like this, plus proprietary data and research, conference discounts, on-demand access to event content, and more!

Join Today!


  1. Roman,

    You have made a great point here and I am glad that someone has at last supported it with a law. I submit that one way to keep an alternative measure is to have several that can be brought to bare at any time. One set of primary and secondary measures that satisfies a brand, its agency, and a programmatic partner is a CPM payment model with any number of backend metrics. This provides the separation of goals and metrics and allows the advertiser some flexibility in assessing the success of the campaign without biasing the vendor’s activity.

    For example, at Choicestream – a full-service Demand Side Platform that offers both branding and direct response advertising – we price campaigns in terms of cost per impressions, but we also measure CPC and CPA to make sure that we are hitting the brand’s goal.

    Let’s use the example of a retailer that is selling jackets. Measuring purely clicks on the ad shows you engagement, in a way, but it goes much further than that. In order to make sure that the users who are clicking on the ad are actually potential buyers of the jackets, rather than window shoppers, the measurement needs to go further than the impression of the ad on the site, or the initial click by a user. By incorporating CPA metrics and measuring when a user who views an ad ultimately converts to and purchases a jacket, the advertiser has a much more meaningful metric, as you have advocated.

  2. Doug Samuelson

    I don’t quite agree with Goodhart’s Law as stated. A measure like profit, or deaths per 1000 procedures, or number of widgets produced per day, or accidents per month, can remain meaningful even when it becomes a performance target. I’d say that a measure becomes useless not when it becomes A target, but when it becomes THE target — that is, eveluation focuses on that measure and nothing else. This is especially true when the focus on one measure entails becoming oblivious to how the measure could be gamed.

  3. I’ve not heard previously of Goodhart’s Law. What I learned, without a name attached to it, was: “You get what you reward.”