The Digital Advertising Industry Has An Identity And Data Integrity Problem

“Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Joshua Lowcock, executive vice president and chief digital officer at UM Worldwide.

Despite what you may have been led to believe, the real problem facing the digital industry is not ad fraud. The real problem is identity, compounded by the lack of quality and integrity put into verifying audience data.

If the industry put more effort into verifying identity and audience data, not only would ad fraud be less of a problem, but digital overall would achieve a better ROI.

Fraud exists in advertising for the same reason it exists anywhere: the failure to authenticate identity. Frank Abagnale, of “Catch Me If You Can” fame, started his criminal career by using false identities. Today, it’s even easier, according to Abagnale: “What I did in my youth is hundreds of times easier today. Technology breeds crime.”

It couldn’t be truer in the digital industry. It’s relatively easy to set up some scripts, access cloud-based infrastructure and generate bots that impersonate human traffic.

What many don’t realize is that nonhuman traffic is based on building fake identities full of false data profiles. The false data profiles are not adequately reviewed or valued against real and quality data profiles. For example, an advertiser trying to reach beauty customers may use an audience data pool consisting of one or a blend of all of these data sets:

Website visitors who have read and/or watched beauty content or video in the past 30 days
Identified customers who purchased beauty products at CVS in the past 30 days
Logged-in Google or Facebook users who have engaged with beauty content in 30 days
Non-logged-in people who added beauty products to their shopping carts

Which of these data sets is more likely to be based on real identity and provide meaningful data? Which is easier to generate bot traffic against?

In the world of digital fraud, the first (traffic) and fourth (anonymous cart abandoners) data sets are easy to create and should not be trusted with the same level of integrity as the second (retail purchase) and third (Google/Facebook users) data sets. The sad truth is that all of the above are often weighted equally.

Do brands know which data sets they are using or if quality data is being blended with “The Big Short” equivalent of subprime data? In my experience, there’s more time spent agonizing over and reviewing the lists of sites and apps in a media buy than on the quality of data used to inform the buy.

There’s even less time given to ensuring that the same data set is applied consistently from campaign to campaign.

Why does this happen? Sticking with “The Big Short” analogy, it’s easy to verify if a house or site exists when you can just drive by or type in the URL. But it’s a lot harder to verify the person who owns the house unless you knock on the door. And by the time you find out the dog owns the house, it’s too late – you’ve been scammed. The industry needs to spend more time knocking on doors to find out what identity and data is informing a buy, instead of looking at what publishers are on the buy list.

A compulsion to review sites and apps in the buy is only part of the problem. The real issue is that the industry doesn’t validate identity and data because it hasn’t evolved to defining buying audiences beyond simple demographic terms, such as “F 25-54” for a beauty intender. Despite what digital offers, audience definitions are often loose and vague. There is no diligence, proper score or index on what data should be relied upon to define and buy the audience. This is something clients and agencies must jointly own.

There are clear benefits from putting in this extra effort. Look at Google and Facebook. They get a disproportionate share of digital ad dollars each year. Google and Facebook have identity baked into their YouTube, Facebook and Instagram advertising platforms and dedicated engineering teams battling false identities. Their audience data has integrity. This is why Google is now publicly making noise about building identity into its DoubleClick stack. Digital ad dollars will go to platforms where identity can be validated and data trusted.

Plus, the more you rely on identity and quality data the better the business results. Why do Facebook and YouTube always show higher ROI in attribution and mixed-media modeling studies? It’s because they are operating from a base of real identity and quality data, not probabilistic data pools. The greater the ROI, the more ad dollars you get. Imagine how the rest of a brand’s digital media would look if all of its digital buys were based on deterministic identity and data integrity. Better yet, imagine the business results!

So how as an industry do we move this forward? Linking campaign objectives to real business outcomes from real customers is the first step. I’m not talking about direct response, but moving the needle of a business based on real metrics, such as sales. Business results help make data accountable and indirectly drive data integrity.

More broadly, the industry needs to stop and think about identity and data integrity. It’s time for everyone to start demanding details on what data is being used to inform digital media buys. The industry should establish a Data Integrity Group, just like it formed the Trustworthy Accountability Group to fight fraud, so that it’s collectively aligned on what constitutes integrity and quality of data and, as a result, audiences.

We must then apply data integrity consistently across every digital activation. By doing so, not only will we help combat fraud but digital will generate a better ROI.

Follow UM Worldwide (@UMWorldwide) and AdExchanger (@adexchanger) on Twitter.