The Attribution Error

Data-Driven Thinking“Data-Driven Thinking” is a column written by members of the media community and containing fresh ideas on the digital revolution in media.

Jeremy Stanley is SVP Product and Data Sciences for Collective.

As an industry we have largely concluded that existing measurement solutions (CTR, view-through and click-through conversion) have glaring flaws. And so we have turned to independent vendors (see Forrester on Interactive Attribution) to employ sophisticated algorithmic attribution solutions to value digital advertising impressions. These solutions cater to our desire to glean definitive and actionable data about what works from the oceans of data exhausted by our digital campaigns.

Yet algorithmic attribution is founded on a fatally flawed assumption – that causation (a desired outcome happened because of an advertisement) can be determined without experimentation – the classic scientific model of test and control.

No medicine is FDA approved, no theory accepted by the scientific community absent rigorous experimental validation. Why should advertising be any different?

Consider that there are two driving forces behind a consumer conversion.  The first is the consumer’s inherent propensity to convert. Product fit, availability, and pricing all predispose some consumers to be far more likely to purchase a given product than others.

The second is the incremental lift in conversion propensity driven by exposure to an advertisement. This is a function of the quality of the creative, the relevance of the placement and the timing of the delivery.

To determine how much value an advertising impression created, an attribution solution must tease out the consumer’s inherent propensity to convert from the incremental lift driven by the ad impression. Algorithmic attribution solutions tackle this by identifying which impressions are correlated to future conversion events. But the operative word here is correlated – which should not be confused with caused.

By and large, algorithmic attribution solutions credit campaigns for delivering ads to individuals who were likely to convert anyway, rather than creating value by driving incremental conversions higher!

To highlight this problem, let’s consider retargeting. Suppose that an advertiser delivered at least one advertisement to every user in their retargeting list (users who previously visited their home page). Then, suppose that 10% of these users went on to purchase the advertised product.

In this simple example, it is impossible to tell what impact the advertising had. Perhaps it caused all of the conversions (after all, every user who converted saw an ad). Or perhaps it caused none of them (those users did visit the home page, maybe they would have converted anyways). Either conclusion could be correct

More complex real-world scenarios just get more complicated. Biases arise from cookie deletion, variation in Internet usage and complex audience targeting executed across competing channels and devices. Sweeping these concerns aside and hoping that an algorithm can just ‘figure it out’ is a recipe for disaster.

Instead, the answer is to conduct rigorous A/B experiments. For a given campaign, a set of random users is held out as a control group, and their behavior is used to validate that advertising in the test group is truly generating incremental conversion or brand lift.

Further, through careful analysis of audience data, one can identify the ‘influenceables’ – pockets of audiences who are highly receptive to an advertising message, and will generate outsized ROI for a digital advertising campaign.

My own observation, observed across numerous campaigns, is that consumers with a high inherent propensity to convert tended to be the least influence-able!  Many of these consumers have already made up their mind to purchase the product. Showing them yet another digital advertisement is a waste of money.

Yet that is precisely what many advertisers reward today: serving ads to audiences who are likely to convert anyway to gather credit in the last view attribution schemes. Algorithmic attribution might make this marginally better (at least credit is distributed over multiple views), but at significant expense.

Advertisers would be far better served if attribution providers invested in experimentation instead. However, I anticipate that many attribution vendors will fight this trend. The only rigorous way to experiment is to embed a control group in the ad serving decision process that is checked in real time, to ensure specific users are never shown an advertisement. This approach is radically different from the prevailing attribution strategy of “collect a lot of data and throw algorithms at it.”

By leveraging experimentation coupled with audience insights, savvy marketers can extract far more value from their digital advertising dollars. Those who do so now will gain significant competitive advantages.

Follow AdExchanger (@adexchanger) on Twitter.

Enjoying this content?

Sign up to be an AdExchanger Member today and get unlimited access to articles like this, plus proprietary data and research, conference discounts, on-demand access to event content, and more!

Join Today!


  1. Great article Jeremy. I completely agree that attribution is at its core a causal question. A standard A/B test is the best way to get attribution for a single client. However, running a controlled experiment across 15-20 different vendors using various targeting strategies and ad channels might just be out of the question. I have to disagree that causal estimation can’t be done in observational data, but I will attest that one is more likely to get biased results in an observational study if all of the data assumptions aren’t carefully verified. So yes, experimental design is certainly safer, but again, it might be impossible to execute in common campaign settings. The alternative, algorithmic attribution (or rather, causal attribution on observational data) is likely to be biased because no attribution vendor will have all the data necessary to truly adjust for all confounding conditions. My question is: until someone can figure out how to do a proper controlled experiment across many channels, do we simply throw out existing algorithmic solutions because they are likely to have some bias? A slightly biased model is certainly better than no model. And like most things scientific, progress is made incrementally.

    • Matt Anthony

      “A slightly biased model is certainly better than no model. ”

      Disagree … bad intel (particularly when taken as fact) is as bad or worse than no intel.

      The problem with these attribution “models” is you can’t measure the “goodness of fit” on any of them to compare across models … because the vast majority of the “models” are simply assumptions themselves. Is 30/30/40 “better” than 70/20/10? How can you tell … what statistic measures “Better”?

      The methods you seek actually DO exist … they’re not easy (Definitely grad school stats stuff) but they’re out there. The greater problem is that no one wants to hear the answer – especially if the answer sounds like “a large percentage of your marketing dollars are incrementally ineffective.” CMOs lose face; channel managers lose budget; agencies lose credibility; the media partners lose revenue.

      Bluntly, the current guise of attribution models is still a heavy dose of subjectivity and assuming masquerading as math to tell a story however someone wants it to look.

      • There’s definitely some subjectivity in attribution reporting. However, there is also solid information that can be leveraged to help determine where ad budget should be spent.

        For example, if you repeatedly see that an impression or a click from Display Ad A always occurs right before a click on Search Ad B and leads to a bunch of conversions (Display A –> Search B –> Conversion), are you going to ignore that info? I wouldn’t. That little nugget of information is telling me that the display campaign is working particularly well with that search campaign. That’s a red alert to get my display and search teams to focus on that campaign.

        That’s hard data. There’s no assumptions being generated there. Without the attribution, I wouldn’t have known which channels worked well together.

      • Matt Anthony

        The devil is in the details though – in terms of how you assess the significance of that pattern. Are you saying that just looking at the patterns in conversions is enough on its own? How do you know that same pattern you describe isn’t as or more prevalent in paths that don’t convert?

        That nothwithstanding, the concept of using such data the way you describe it (directional intelligence) makes sense, certainly is more palatable analytically than how it’s being used today. Most folks who have bought into the current level of attribution tech seem to somehow think it’s (a) showing them some sort of causal pathway, (b) allowing them to calculate “true ROI” in more exact numerical terms, and/or (c) allowing them to perform precise “optimization” of marketing spend. The first two are most certainly fallacies of the post hoc ergo propter hoc type. If one wants to assert something like causality, then they damn well better be able to explain the burden of proof for proving such a thing fo rit to be believable.

        I guess the long & short of my message is this: If the marketing community wants to start using “grown-up” mathematical/statistical terms like “optimization”, “causality”, etc, then those same folks need to stop fearing the math itself, stop making things up to avoid higher math, and face facts that true math/stat analysis is potentially difficult and humbling (i.e. you may not like the answer).

      • I should qualify my statement then. A biased model that has been developed on trusted data, by skilled analysts, using accepted methodologies (like training/test sets or cross validation) is better than no model. Sometimes bias is an inherent property of the data available (especially in modeling causal effects). Under these conditions, a slightly biased model is certainly better than no model (which in it of itself is a highly biased model).

        I definitely agree that there needs to be better methods for assigning confidence intervals to the estimates. But again, my main point is that the fact that perfection is slightly out of reach shouldn’t stop us from making progress in this subject. Discovery is an incremental progress, and if researchers can continue publishing and talking about work, something valuable and trusted will emerge. This particular article shows tremendous progress, in that attribution is posed as a causal problem. The debate here is more about estimation of the causal parameters (i.e., controlled vs. observational experimentation).

  2. I like how you’ve mentioned the example of how retargeting might just be picking up people who would convert anyway. I fully agree that an A/B test is the best way to prove the worth of these tactics, however in the case of muti-channel or even single channel-multiple tactics/exposures, a structured test is a near impossibility. It would be great to know if there are any resources that help break that down. Until we have that, attribution algorithms seem to be the only alternative.

    You rightly mention that the majority of advertisers and publishers seek to game the attribution model to claim conversions that would have happened anyway. So even if it costs more in the short term to implement, I’d expect that the savings from it will outweigh the costs for advertisers who have a significant portion budgets allocated to Affiliates/CPA partners.

    These are the guys who would hurt the most from changes to the current model. But in the end it only helps us prove the real worth of Advertising.

  3. Both methods have their flaws, its a question of what is ‘less flawed’. And I think it really comes down to marketers spending extra $ for a clean test, regardless of approach.

    The flaw with algorithms is that they are about correlation vs causation, yes. The way to try to isolate measurement of ‘true incrementality’ from ‘false positives’ is with enough data. If you have a time series of data that includes both ‘in market’ and ‘out of market’ data, or variations in execution, you can look to create real correlations between cause and effect, stimulus and response. This is at best a real correlation, not causation like you say, but good enough?

    The challenge with A/B in the real world, as the other commenters explain, is creating a clean control group for a broadcast media. An ‘unexposed’ group for Geico or AmEx ads is pretty darn hard to create. Especially using ad server cookies to create control groups, which is the technique many advertisers lean on. Investing in a true control study that uses people (panelists) and not cookies is a better approach, but this gets expensive.

    The net for me, is you need to spend money for a real study – real $ on a good model, or real $ on good control group. Unfortunately that’s where a lot of tests fall flat. You get what you pay for.

  4. Your retargeting example is a perfect scenario since the first thing I would tell a marketer who’s struggling with an attribution problem is to never pay for view-through conversions. You’re right that there’s no way to tell if those users were likely to convert anyway. To me it seems the challenge is less a question of attribution and more a question of allocation of budget. Even if you truly could understand the percent of influence each channel had on every purchase decision, you would be hard pressed to figure out how to slice up that dollar across multiple vendors with multiple business models.

  5. Attribution must also take into account the full spectrum of what ads are intended to accomplish, not merely the last click towards a purchase. High consideration purchases tend to convert after a search, when the consumer has already made up their mind. But search should not get all the credit if a display ad created awareness or a pre-roll enhanced interest or a mobile ad suggested an offer. Also, attribution through algorithm can only work when the purchase is made online. Almost no one buys a car online, but many people configure and price online. If configuration is a proxy for conversion, though, there are many multiples more conversations than cars purchased. Finally, attribution must take into account all ad platforms, not just digitally delivered ads. While I agree that an experimental model is the best way to draw conclusions, I am certain of two things: (1) media agencies are generally not able to handle the complexity of planning for an experimental model, and (2) an experimental model would be highly challenged to account for all variables. That doesn’t mean it’s not the right answer, only that it will always be an answer that includes some fuzz.

  6. Algorithmic attribution solutions are not slightly biased, they are extremely biased. In the simple retargeting example I gave they cannot tell if the advertising caused 0% of the conversions or 100% of the conversions. No amount of math or manipulation of data can overcome that kind of fundamental limitation.

    On the other hand, executing controlled experiments is really not that hard. You may not be able to answer every question you might have about your campaign (as promised by algorithmic attribution solutions), but at least you know that you’ll have the right answer to the most important ones!

    I recommend to advertisers the following strategy:

    Start with a small control group of stable cookies, and ensure that group is excluded from all advertising you are testing. This gives answers to three questions:

    1. What impact is your entire digital spend having on consumers – is it causing them to change their behavior, and if so, by how much?
    2. Is your digital spend becoming more or less effective over time?
    3. Which audiences are most ‘influence-able’ by your campaign? (Note that this gives enormous insight into how you should optimize.)

    To gain deeper insight into which channels of spend are most productive, establish further independent control groups from which each channel, publisher or vendor is held out in turn. From this, you can answer the question:

    4. How much incremental value does a particular channel / vendor / publisher drive?

    Finally, to gain deeper insight into which creatives are most effective, serve a PSA to a random control group for each creative, then you can answer the question:

    5. Which creative is most effective?

  7. This is why I still prefer an equal attribution model over some of the deeper algorithmic solutions that attempt to provide a “weight” or a “score” for the referring ad. For every variable that exists within that algorithmic formula, chances are I can find another variable that should have been included.

    The truth is (and many studies have proved) that a majority of time, when multiple ad channels are leveraged, the highest ROI is achieved. For example, a combination of search + display drives a higher ROI versus using either channel alone.

    I will say though that display was getting a bad rap until attribution reporting came along. In 2006 we (at Yahoo) were pushing the concept of an Assist to our advertisers who were both running search and display ads with us. While the Assist report only provided click data (no view through assists), it did help clients understand that while display ads did not convert well on their own, they were valuable at supporting the path to conversion via search. At which point we saw display spends increase and ROI increase along with it (as more budget was spread across specific display and search campaigns).

  8. If you wonder why these attribution discussions for the most part occur within the display world, it’s because search knows it’s getting a better deal than it deserves and therefore doesn’t gain by talking about it.

    That’s all the proof you need that last click is broken, and bravo to the attribution vendors for at least attempting to move the industry forward. Much more good is done in the here & now by their flawed approach than will ever come from the unrealistic notion of each advertiser holding itself to such a high scientific bar.

    >50% of paid search queries and clicks are driven by display, TV and other channels, and incrementally fixing attribution is much more important than being uber-scientific IMHO.

  9. In response to Jeremy’s response:

    How would you ‘exclude cookies’ from paid search? I can see how this would be possible for display. Can you explain the logistics around doing this.

    I really enjoyed your article. I completely agree that algorithmic attribution wrongly assumes causation from correlation and implementing AB testing is the right way to go. I am struggling with the practicalities of doing this though.

    • In response to Nathan’s question re: excluding cookies from paid search.

      You raise a very important point. My article was primarily focused on display, video and in some circumstances mobile.

      To the best of my knowledge this is not possible within paid search on the major engines (Google, Bing). Those platforms are technologically closed and so as a 3rd party there is no way to check a cookie either in the purchase decision or in the creative / message delivery.

      The reality is that paid search benefits the most from the flaws in current attribution solutions. There is simply no way to determine if the user who clicked on the paid search would not have clicked on the organic search link, or later typed in the URL and purchased anyways.

      The search engines know that. And I don’t think they are likely to allow for this kind of experimentation unless there is significant demand for it from advertisers.

      Note that the same is true of social platforms (Facebook, Twitter). They are closed systems that you cannot conduct experiments in. But I think it is very much in their interest to prove that their ads drive changes in user behavior. And so I expect them to launch this type of capability in the future.

  10. While those who know me know that I am the first to demand “experiment correctly or do not, there is no try” (apologies to Yoda), I have also been impressed with advances in statistics which can model around some pretty uncontrolled situations.

    It’s one thing to debate how the business should respond to these learnings, and yes, many will misinterpret. But to unilaterally say they should be ignored because they are correlations is throwing away information, information that many fields (from Econ to Sociology to Psychology to yes, even Business) have figured ways to leverage appropriately.

    Given the choice between observation study and controlled experiment, I’ll experiment every time. But given the choice between observation study and hand waving, well, I’ll take the study, esp. if I’m the one setting it up to make sure it’s done right and interpreted correctly.

  11. Great article and great points made in the subsequent comments. Note that attribution solutions that build a true engagement model using 100% of touches of 100% of consumers who were touched are not just capturing correlation, but are predicting causation. They test for this in a couple of ways:

    First- At a macro level, a solution can test against future time periods, since past performance does not necessarily predict future results. So when an attribution model is built, it determines that specific placements, creatives, campaigns, etc., have a certain fractional probability of conversion based on observed behavior. The attribution solution then can predict next month’s (or whatever time period’s) conversions based on the observed behavior and the model. There is typically a model validation process that evaluates this. To the level of accuracy that the model accurately predicts future behavior, marketers can recognize it as valid, and that it doesn’t represent accidental or coincidental artifacts, but true causation.

    That’s how one knows the model is predictive.

    The other distinction is at an ad impression level. Let’s say you see a banner ad, and later go on to convert. A fractional attribution model will figure out the correct attribution for that ad. But how does it truly know that the ad cause you to convert? In order to tease this out, models can differentiate between the ad lift from the audience lift by creating a hold out universe of users where the ad that is displayed is unlikely to affect conversion one way or another.

    So, if you are a Star Wars geek, and someone is selling a new Han Solo action figure, just going to where an ad appeared may mean that you are likely to buy the figure. But an attribution solution can take 5% of the ads and display a placebo ad (such as for the American Red Cross or other Ad Council Charity ad), and then calculate the difference between the charity ad and the Han Solo ad. This difference is the “net” lift just caused by the ad, rather than the gross, caused by the ad plus the audience. That way the marketer can feel confident that the ad caused the lift.

    Essentially, when marketers are interested in net lift, they can just rotate a placebo ad in their ad rotation and let their attribution solution tease out the difference between the effect of the audience and the effect of the ad.

    • Matt Anthony

      “To the level of accuracy that the model accurately predicts future behavior, marketers can recognize it as valid, and that it doesn’t represent accidental or coincidental artifacts, but true causation.”

      This is not true. Just because a model is predictive does not mean it meets the mathematical requisites for proof of causation, since you cannot rule out common effects that are the true “cause” of coincident variation between your variable and the outcome.

      You point out in the latter half of your post that randomized testing allows for accomplishing causal inference, and this is true, but it’s the reason why that’s important here. Randomized testing breaks the correlation between treatment assignment and confounding effects that would skew baseline performance between test and control if left unchecked. Randomization evens these effects out on average (though certainly in any one iteration there could still be imbalance so one must still check your samples).

  12. Essentially, you do have a “control” group–the 95-99% of exposed users who are NOT converting is a very valuable stream of data. When compared to those who do convert, and then that feedback is looped back into the attribution models, you have deeper intelligence than you did before.

  13. Matt Anthony

    So assuming a completely subjective, equal-weight system is a better alternative?! Sorry, that doesn’t pass the common-sense math test. Have you considered the incentive that an equal-weight system creates for media partners/publishers? A system like you propose incents media to SPAM CONSTANTLY, since the best way to ensure you get at least a piece of a claim on a conversion is to try to message every individual you can!

    I could make a similarly ridiculous argument about your proposed solution. OK, so maybe that algorithmic model doesnt include every possible variable – but we’re supposed to accept an equal-distribution rules based model over the infinite combinations of weights we could use? And do it without any semblance of a model fit statistic to tell us how good of a descriptor your model is?! Why equal, why not 70-20-10, 50-0-50, etc?

    Sorry to be abrupt, but this is a classical example of the type of thinking that is holding back true discovery. You’re advocating taking assumptions for how things work for granted over using data and statistics to explore and discover them – that is extremely dangerous.