If You Want To Measure Incrementality, Do It Right

 sebastien-blanc“Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Sebastien Blanc, general manager, US, at Struq.

Considering the number of channels where marketers can spend their budgets, understanding and proving a return on ad spend (ROAS) is vital. If ROAS is properly understood, then marketing budgets shouldn’t be capped.

If you can drive revenue above the cost of your advertising, why would you stop? Unfortunately, CMO budgets are capped because understanding incremental revenue is nontrivial.

Incremental revenue is defined as revenue that would not have occurred without a specific campaign, everything else being equal. It is a view that is radically different and more reliable than last-click revenue.

The concept of incrementality is still maturing and different tests with the same name actually cover very different realities, ranging from accurate depiction of the truth to pure science fiction. But while testing incrementality is littered with pitfalls, such as misallocating users or premature decision-making, there are two methodologies that can help avoid them.

Different Approaches

An incremental revenue test compares the average revenue driven by users from two groups: those assigned to a retargeting group vs. users in a control group.

When and how users are assigned is critical. You never want to compare people who saw an ad to those who did not. This would flatter results, since you would only show ads to the highest-value users. We need both groups to have the same blend of users, some highly engaged and others less engaged.

The first step is to split your pool of users in two groups: one to be retargeted in a normal way and a control group. There are two ways to split the groups that avoid the problem of comparing “apples with pears.”

You can split new users randomly as they land on site and only show ads to the half that make up the retargeting group (bear in mind some in this group you will decide not to show ads to). Never show ads to the other half of users that comprise the control group.

This methodology has the advantage of not triggering any cost of media for users in the control group, making the experiment slightly cheaper. Revenue per user is computed using all the conversions happening in each group – therefore ignoring notions like last click and last impressions.

Because this set-up takes all users into account, the methodology drives the most reliable results when budgets are close to the maximum theoretical budget. If you decide to spend only 10% of the maximum deliverable budget, results are likely to be way below the potential incremental revenue of your program.

You can also split users randomly at the point of ad serving, thereby showing ads to users in the retargeting ad group and charity ads to those in the control group. This methodology drives more reliable results at lower levels of spending and is easier to track on a daily basis. Because ads are shown to users, the control group will incur additional media costs that will decrease the ROAS of your program during testing. This methodology is what most marketers go for because you also help a charity in the process.

Neither methodology is perfect, so marketers need to choose based on specific sites and goals.

The Right Set-Up For Your Goals

As in most things digital, the devil lies in the detail. The first critical aspect involves understanding how many conversions are needed in the control group for the results to be reliable. There are several simulators out there to help you compute the right sample size. Keep in mind that before you hit this threshold, it is impossible to rely on any result because small samples often produce dramatic results, either positive or negative.

Beside statistical significance, it is also important to include at least one full decision cycle in your experiment, preferably two. If you know that customers usually take seven days to buy one of your products, then ideally the incrementality test must last at least 14 days to include two full cycles.

Most marketers want to go for 50/50 split of users. Even though it might sound more reliable, it actually does not make results more reliable or easy to interpret. It instead limits the revenue generating power of your campaign. On a website receiving more than 1 million visitors per month, you can reach statistical significance in a few weeks with a 90/10 split, thus maximizing revenue at the same time as you measure incremental revenue.

Finally, it is important to make sure the control group is not contaminated, meaning that no user in the control group should ever see a retargeting ad. You can guarantee that by only populating the control group with brand new users.

Being able to measure incremental revenue in an accurate way is the key to maximizing your growth as a retailer. Since each situation is unique, make sure you study your goals thoroughly and agree on the best possible methodology with your vendor before starting any test.

Follow Struq (@struq) and AdExchanger (@adexchanger) on Twitter.

Enjoying this content?

Sign up to be an AdExchanger Member today and get unlimited access to articles like this, plus proprietary data and research, conference discounts, on-demand access to event content, and more!

Join Today!


  1. Thanks for highlighting a really important concept to ROI. I am a big fan of incrementality measurement, as it provides a solid basis for understand how many conversions were actually caused by a given marketing initiative. Too many of the measurement techniques in place today only look at correlation (i.e. user touches) and then assign ROI on that basis. The correlation approach doesn’t answer the questions: “How many users would have converted on their own?”. As Sebastien points out, this requires a carefully designed experiment to measure causation.

    A couple of thoughts to add here:

    1) It is essential to make sure that the test and control group are completely random. It is very easy to introduce bias into your experiment by taking a shortcut with randomization. One easy way to create good randomization is by using an nth record assignment technique. If your control group is 10% and your experimental group is 90%, then every 10th incoming record gets assigned to the control group. Millisecond timestamps on records also provide a reasonable way to create a test and control group. Just take the last digit in the timestamp and assign the record to the control group if the last digit is a 9, otherwise it is in the test group.

    2) The other challenge here is capturing the value of online-to-offline conversions. Intuitively we “know” that some of the value of online impressions is realized offline or through alternative channels, but it is tricky to measure this path to conversion as users hop between devices and channels. A true measure of sales lift must incorporate all conversions – cross channel, cross device – or you will significantly underestimate the net impact.

    One data point – When TruSignal is measuring the net incremental impact of digital campaigns, we typically see only 10% of the conversions being captured by an online conversion pixel. Granted, we are focused on mid-funnel prospecting campaigns that have a much longer time to conversion than in-market campaigns, but the point is that the longer the users’ path to conversion, the greater to fall-off between actual value and online pixel measured value. Cookies crumble, multiple devices get swapped, and the user picks up the phone or walks into a local branch to complete the sale.

    One way to address this challenge by comparing our test and control targeting audience against actual CRM new sales data in an offline match process. No online conversion pixels needed. By doing away with real-time pixels to measure performance and matching against actual sales data, one can measure the true value of online campaigns – regardless of the last touch channel or device. By having a test and control group in place, one can measure the incremental new customers – as Sebastien point’s out in his post. This is the true basis for ROAS.

    • Siyun.F

      @ David

      Well put! On your second point. That is exactly how we do online-offline sale impact measurement for some of the big retailers in the US. Additionally, we have three measurement period: before, during and post. In this way, we can observe any changes in buyers as % exposed users, contribution from new buyers vs. repeat buyers. More importantly, quality control for both test and control group to ensure users in the two group are “equal” – similar demographics, life style etc and hence comparable propensity to convert/purchase. In this way, we can be confidently say the incremental sales is attributed to the campaign itself, not influenced by selection bias.

  2. How often should you re-run incrementality tests to confirm your results are still holding true? What if you running in many different countries should you run incrementality tests in them all, or just some?