Scale: A Third-Party Data Killer

alanpearlsteinnewData-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Alan Pearlstein, CEO and co-founder of Cross Pixel Inc.

A fundamental difference between offline and online data usage is associated marketing cost.

Digital ads (banners) delivered to specific audiences across RTB average around $3 cost per thousand (CPM). A direct marketer, however, must invest significantly more per customer or prospect to drive a sale via direct mail, typically more than a $600 CPM. These cost constraints mean offline marketers have no choice but to be data efficient and highly focused. Scale is a very expensive enemy.

In the digital world, we see an opposite effect. Rather than apply a vigorous focus and discipline on identifying the most viable targets, digital data suppliers bloat categories to drive scale and increase revenue. Data suppliers, for example, license audiences of auto intenders in segment sizes as large as 20 million users, yet only 7 million to 8 million cars are purchased in the United States each year. This approach commoditizes data, mitigates its effectiveness and is why data buyers complain that third-party data doesn’t work.

Third-party data works if you use it correctly. That usually means making the audience smaller and more targeted.

The Commoditization Of Data

However, it isn’t just cost that is driving commoditization of digital data. There are several other factors in play.

The first is accuracy. Offline shopping and address data is more accurate and reliable than anonymous online, cookie-based behavioral data. When you purchase offline data, such as mailing addresses or catalog shopping activity, you usually reach the desired target. In the online world, which is currently cookie- and browser-based, it is hard to be sure that the ad is being presented to the intended target.

I am not suggesting offline data points are of higher quality; I would actually argue the opposite. Offline data focuses on analyzing past purchases and activity, while online behavior provides predictive data points, based on online research and future purchases. Mailing data is more reliable than digital audience data, although I foresee that this will change when the dominant players, such as Google and Facebook, begin to apply advertising IDs across their extended ecosystems.

Creative is a second problem. Banners are a bad solution, and showing them to a less targeted, bloated audience is an even worse solution. But that’s what digital marketers do. Direct mailers get a printed piece to their desired specs in the home of their targeted prospect. What do digital buyers receive — a fleeting moment in a banner ad that is ignored? Digital marketers get exactly what they pay for, and it’s not good. The industry needs better creative formats and executions. I think in-stream ad placements are a great move in this direction, but we need a lot more progress.

A third issue is infrastructure constraints, which is painful to say given the investment that has been made in ad tech. In the offline world, a direct marketer knows which mailing list generated each sale. The conversion data can be used effectively to plan future mailing efforts and improve targeting and performance.

In the online world, most campaigns are not measured as rigorously. The problem is that it is very difficult for a data-driven buyer to set up campaigns in DSPs so that they can measure and optimize by each individual audience against every media opportunity, such as Facebook, display or video. To do this requires a significant amount of work up front to set up dozens of individual campaigns for each of the audience targets. Buyers don’t have the time to do this and many end up bundling data sources together into one campaign. This is a serious problem that needs to be addressed by the DSPs and DMPs. If we want to move our industry forward, buyers need to know which audiences work, and they need the capability to do this easily and efficiently.

It is unfortunate that the knee-jerk reaction to these challenges has been to create oversized, commoditized audiences on the premise that it helps cover flaws in the current system. The truth is that it does the exact opposite.

It’s time to fix the problems the right way and to start thinking small.

Follow Alan Pearlstein (@alanpearlstein), Cross Pixel (@crosspix) and AdExchanger (@adexchanger) on Twitter.

Enjoying this content?

Sign up to be an AdExchanger Member today and get unlimited access to articles like this, plus proprietary data and research, conference discounts, on-demand access to event content, and more!

Join Today!


  1. “This approach commoditizes data, mitigates its effectiveness and is why data buyers complain that third-party data doesn’t work.”

    This seems a likely outcome when data generation is disassociated from its application. The data sellers rely on scale while the buyers need precision. With no feedback loop back to the sellers one should never expect the data to be truly optimal for any given application.

    A better approach would be to include some sort of score and a timestamp associated with when the inference was made. Everyone should understand that “auto-intender” really is just a probability assigned by some model. If the vendors supplied this smart buyers could decide for themselves exactly how to trade scale and performance. But of course, this might not be in the seller’s best interest.

    What to do….what to do…

  2. Alan, great article and quite thought provoking. You have identified the problem correctly, but I would take a different approach to the solution. First “bloated” segments are a good thing. Of course a segment like Auto Intenders will contain a full spectrum of intent, but many advertisers will want to target lower intent at the top of the funnel. The data providers are delivering a full set, and it is up to the advertisers to filter out what they don’t want. Second, separating the data sources is not the solution. Any approach that measures data sources and does it at the campaign level is flawed. Treating a data source as a whole will always yield mediocre results. The better approach is to aggressively blend all the data sources, then measure each impression at the level of the data points. Systems that do this can discover performance in micro-segments defined by one field from one data provider and two fields from another. Measuring the combinations at the individual impression is the most effective way to accurately identify a high performing audience because it effectively filters out the portions of each segment that are not useful to a specific campaign.

  3. As someone who went from a list management company to an RTB exchange, much of what you’re saying resonates with me. But what we have here is a chicken and egg problem, exacerbated by misaligned incentives.

    Just like squashing click fraud, what you’re proposing is a function of time, will, and marketer buy-in. Audience buys and campaign setups aren’t ever going to get more specific unless marketers signal a willingness to care more and pay more for it. Agencies, by and large, wont want to be the first ones to go to their clients and tell them that they’re getting what they’re paying for and they need to pay more to get better stuff. And CMO’s certainly wont want to go to their CFOs and tell them that they’ve been wasting money and need more money to waste less.