The Digital Advertising Industry Has An Identity And Data Integrity Problem

joshuaData-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Joshua Lowcock, executive vice president and chief digital officer at UM Worldwide.

Despite what you may have been led to believe, the real problem facing the digital industry is not ad fraud. The real problem is identity, compounded by the lack of quality and integrity put into verifying audience data.

If the industry put more effort into verifying identity and audience data, not only would ad fraud be less of a problem, but digital overall would achieve a better ROI.

Fraud exists in advertising for the same reason it exists anywhere: the failure to authenticate identity. Frank Abagnale, of “Catch Me If You Can” fame, started his criminal career by using false identities. Today, it’s even easier, according to Abagnale: “What I did in my youth is hundreds of times easier today. Technology breeds crime.”

It couldn’t be truer in the digital industry. It’s relatively easy to set up some scripts, access cloud-based infrastructure and generate bots that impersonate human traffic.

What many don’t realize is that nonhuman traffic is based on building fake identities full of false data profiles. The false data profiles are not adequately reviewed or valued against real and quality data profiles. For example, an advertiser trying to reach beauty customers may use an audience data pool consisting of one or a blend of all of these data sets:

  1. Website visitors who have read and/or watched beauty content or video in the past 30 days
  2. Identified customers who purchased beauty products at CVS in the past 30 days
  3. Logged-in Google or Facebook users who have engaged with beauty content in 30 days
  4. Non-logged-in people who added beauty products to their shopping carts

Which of these data sets is more likely to be based on real identity and provide meaningful data? Which is easier to generate bot traffic against?

In the world of digital fraud, the first (traffic) and fourth (anonymous cart abandoners) data sets are easy to create and should not be trusted with the same level of integrity as the second (retail purchase) and third (Google/Facebook users) data sets. The sad truth is that all of the above are often weighted equally.

Do brands know which data sets they are using or if quality data is being blended with “The Big Short” equivalent of subprime data? In my experience, there’s more time spent agonizing over and reviewing the lists of sites and apps in a media buy than on the quality of data used to inform the buy.

There’s even less time given to ensuring that the same data set is applied consistently from campaign to campaign.

Why does this happen? Sticking with “The Big Short” analogy, it’s easy to verify if a house or site exists when you can just drive by or type in the URL. But it’s a lot harder to verify the person who owns the house unless you knock on the door. And by the time you find out the dog owns the house, it’s too late – you’ve been scammed. The industry needs to spend more time knocking on doors to find out what identity and data is informing a buy, instead of looking at what publishers are on the buy list.

A compulsion to review sites and apps in the buy is only part of the problem. The real issue is that the industry doesn’t validate identity and data because it hasn’t evolved to defining buying audiences beyond simple demographic terms, such as “F 25-54” for a beauty intender. Despite what digital offers, audience definitions are often loose and vague. There is no diligence, proper score or index on what data should be relied upon to define and buy the audience. This is something clients and agencies must jointly own.

There are clear benefits from putting in this extra effort. Look at Google and Facebook. They get a disproportionate share of digital ad dollars each year. Google and Facebook have identity baked into their YouTube, Facebook and Instagram advertising platforms and dedicated engineering teams battling false identities. Their audience data has integrity. This is why Google is now publicly making noise about building identity into its DoubleClick stack. Digital ad dollars will go to platforms where identity can be validated and data trusted.

Plus, the more you rely on identity and quality data the better the business results. Why do Facebook and YouTube always show higher ROI in attribution and mixed-media modeling studies? It’s because they are operating from a base of real identity and quality data, not probabilistic data pools. The greater the ROI, the more ad dollars you get. Imagine how the rest of a brand’s digital media would look if all of its digital buys were based on deterministic identity and data integrity. Better yet, imagine the business results!

So how as an industry do we move this forward? Linking campaign objectives to real business outcomes from real customers is the first step. I’m not talking about direct response, but moving the needle of a business based on real metrics, such as sales. Business results help make data accountable and indirectly drive data integrity.

More broadly, the industry needs to stop and think about identity and data integrity. It’s time for everyone to start demanding details on what data is being used to inform digital media buys. The industry should establish a Data Integrity Group, just like it formed the Trustworthy Accountability Group to fight fraud, so that it’s collectively aligned on what constitutes integrity and quality of data and, as a result, audiences.

We must then apply data integrity consistently across every digital activation. By doing so, not only will we help combat fraud but digital will generate a better ROI.

Follow UM Worldwide (@UMWorldwide) and AdExchanger (@adexchanger) on Twitter.

Enjoying this content?

Sign up to be an AdExchanger Member today and get unlimited access to articles like this, plus proprietary data and research, conference discounts, on-demand access to event content, and more!

Join Today!


  1. Randall Tinfow

    I have my doubts about Google being able to deliver robot free data. GA is a huge mess requiring regular intervention to weed out the bogus referrers.

  2. Joshua, we have not yet had the pleasure of meeting but thank you for sharing this truth. There is as much work to be done on the transparency, quality and integrity from from a Data stand point as there has been from an inventory, financial, fraud perspective. Unless we talk about it, it does not get better. Transparency does not start and stop with a promise to be transparent. We need standards and tools to measure across the board.

  3. If the Publisher gets 40 cents and the Advertiser pays a dollar, exactly who in the middle is actually adding the 150% of value to the 40 cents?
    We’re with you on the migration beyond marketing. There are decent actors, with real business activities, adding demonstrable and measurable value. No nonsense. Matching activities and spend to customers in client databases.

  4. Joshua,

    Thank you for this article. It is critical that marketers and agencies demand more information from providers about data quality. The advertising industry’s steadfast focus on scale was what we believe prompted data providers to focus heavily in this area (over quality) – to the detriment of the marketplace.

    I am glad to report this is changing – albeit slowly. Marketers and agencies are starting to ask good questions about data sourcing, identity linking methodology, verifying for non-human traffic in the data, and other hygiene best practices. There is also consistent discussion at the highest level of the data industry on how we can work to offer solutions and highlight the quality players.

    Full disclosure, my company provides deterministic audience data to the ecosystem as well as an enterprise data hygiene solution. We work with sophisticated media buyers to make sure they are asking all their data providers the right questions to determine identity and quality. We agree that an industry body like a Data Integrity Group would be very helpful in this area.

    Data providers will be part of the solution. And we look forward to working with you, your agency colleagues and others to make this happen.

  5. Joshua,I agree with your assessment here: agencies today are ill equipped to truly vet data quality and thus focus on easy proof points. However, the convoluted value chain of programmatic media puts so many intermediaries between advertiser and publisher that truly understanding who is engaging with an ad is easy for me, but difficult for your clients. We need to work together in this industry to straighten out the pipes. Todays verification tools are a meager band-aid that further confuses the issue and supports the spaghetti in the system.

  6. In other words, the future belongs to direct relationships between individuals (at data collection point) and platforms serving the ads. Or individuals and brands where possible. Everything else is not only an overstretch but also poised to be deemed illegal for simple reasons of privacy and identity control in the very short term. Second and Third-party data has little future insofar as individuals do not grant consent to such onboarding/deduplication/clean-up, no matter how much we audit it. Content, distribution, audiences and marketing are finally evolving – and we should all probably leave 2002 behind! 🙂