Inaccurate Segments May Be Costing Advertisers Billions

“Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Chris Kelly, founder and CEO at Survata.

We’ve all read the doom-and-gloom news about programmatic problems, from YouTube’s brand safety issues to brand advertisers culling their spending and companies like Chase maintaining performance with drastically reduced ad placements. We’ve seen death prognostications of programmatic as the future of digital marketing, then even the death to the death of programmatic predictions. Dizzying.

The one good thing resulting from the discussion has been the honest reflections on how programmatic can grow up.

However, that’s where it largely stopped, and I can’t help but notice that we only addressed half of the equation. While the industry collectively groaned about the “where” within programmatic advertising – where the ads show up – we haven’t sufficiently reflected on its “who,” as in who is seeing the ads.

Do the audience segments that power programmatic contain who they’re labeled to contain?

It’s a fair question. As New York Times CEO Mark Thompson recently wondered, “When we say a member of the audience is a female fashionista aged 20 to 30, what’s the probability that that’s actually true?”

The reality is that it may be quite low. We’ve been so consumed with brand safety and fixing programmatic spray-and-pray approaches that we haven’t really thought about segment validation. Are we sure a segment of “in-market SUV buyers” contains a larger percentage of buyers than a randomized control group? How can we prove that?

Perhaps this “who” question is programmatic’s next dirty little secret. Data scientists creating segments have many economic incentives to make a segment larger, but few to make it more accurate. And they’re allowed to sell it as a black box. Under these circumstances, we can’t expect consistently accurate audiences.

So, how can we get ahead of this issue before it rises to “crisis” levels, like the brand safety scandals?

The first step is admitting there is a problem. Based on industry chatter, we’re already in the first phases. It’s time to dig in and ask tough questions.

Sanity Check, Please

Sometimes the math behind a segment just isn’t there. I’ve seen a segment that supposedly contained small US business owners that was larger than the Census Bureau’s count of small business owners, multiplied by any reasonable device-per-person ratio. Even if you ignore the impossibility of more than 100% membership rates in a category, is it possible that a model captured 70% of people in a certain category? Sanity-checking audience sizes against a Google search should be a first move, especially when millions of advertising dollars are at stake.

Understand Different Data Types

The “who” within advertising segments is generally determined by three types of data: declared, observed or inferred. Many may know that, but if you’d like a refresher, see here.

Understanding the different types is critical because programmatic segments are made up using one or more of these types. Yet brands often don’t ask their partners which is which. I’ve often seen a brand think a segment it’s getting is observed when it’s actuality inferred. That matters, especially if you don’t know the criteria that determined the inferences.

Check The Source

I’ve heard horror stories about agencies doing a forensic dive into segments to learn people who read about a car crash were put into an “automotive intenders” segment. I’m convinced these are more true than apocryphal. Brands should find out about the precise criteria for being put in certain segments. What trade-offs were made between accuracy and scale? Don’t let it be a black box.

Building audiences is hard. The challenges are significant: Offline data may be available only at the household level, family members may share devices and modeling may be unavoidable in many cases. So, the expectations shouldn’t be that a segment contains only people with a specific attribute, but that it contains significantly more people with that attribute than an untargeted group.

Even acknowledging those difficulties, I can’t help but think of the billions of dollars wasted over the years marketing to incorrectly targeted audiences. Programmatic spend is nearing $33 billion in the US alone this year. It’s hard to know precisely the media dollars powered by third-party data versus first-party data, but even conservatively admitting that 10% to 20% of third-party segments are invalid implies billions of media dollars are suboptimally deployed.

Follow Survata (@Survata) and AdExchanger (@adexchanger) on Twitter.

Must Read