Bad Data Makes Ad Fraud An Even Bigger Problem

“Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Keith Fagan, vice president of data solutions at sovrn.

While the digital advertising industry is well on its way to mitigating ad fraud via prevention and new cost models that account for it, there is a much wider problem lurking beneath the surface: bad data generated by suspect traffic.

Consider the number and range of JavaScript tags, commonly called pixels, firing on sites today. As an example, I went to a public site and used the Ghostery tool to gain insight into the pixels being used on it (see image). While some pixels are used for monetization of ad inventory and social sharing, others relate to site analytics, data collection and tracking widgets that feed audience targeting and reporting.

If suspect traffic visits this site, the data collected by the pixels will include a mix of human, suspect and nonhuman activity. The bad data collected may adversely affect site analytics and audience targeting segments.

keithchart1 Further complicating the issue is the fact that fraudulent traffic is extremely sophisticated and can mimic human behavior, such as clicking through the site and completing transactions. This is achieved through malware hosted on a user’s device, which mixes nonhuman with human activity, and click farms, which employ people to commit ad fraud. These factors make it difficult to separate human, suspect and nonhuman behavior.

Implications For Data-Driven Marketing

As marketers continue to rely on more data to drive their overall marketing strategy, bad data from suspect traffic may cause bigger problems.

As consumer attention is increasingly fractured across a wider variety of devices, marketing is becoming increasingly data driven to provide a consistent and personalized experience across every channel.

This has given rise to the concept of a digital marketing hub, which not only enables marketers to collect, combine and analyze enterprisewide data, but acts as a central command center used to coordinate activities across channels, such as email, web, social, display advertising, video and so on.

At the heart of many of these digital marketing hubs are data-management platforms (DMPs). Most vendors reviewed by Gartner as part of its Magic Quadrant for Digital Marketing Hubs are DMPs or have DMPs as a central part of their offering.

One of the major sources of data for DMPs is behavioral and intent data from online advertising activity. This is where bad data from ad fraud comes in. It cuts across channels, even affecting mobile apps and social media. Bad data from ad fraud could cloud or create a false picture of a customer’s cross-channel journey and lead organizations to erroneous conclusions and poor decisioning.

Without proper safeguards in place, even simple location-based targeting can be fraught with danger. Bad data already results in the average company losing 12% of its revenue, according to Experian Data Quality (PDF), and this will only increase if we’re not careful.

No Silver Bullet

Like any large and complex problem, the answer isn’t a single solution, but a composite of multiple approaches. One method is to use partners that help marketers identify and prevent ad fraud. IAS, Pixelate, Forensiq, ComScore, White Ops and DoubleVerify specialize in helping advertisers and publishers identify and avoid suspect traffic and the ad requests they generate. By doing so, they also limit the incidence of bad data.

It’s possible to overlay first-party data on top of third-party data to boost data quality. First-party data, especially from reliable offline channels, is more difficult to fabricate and match back to online data, so overlaying this with online data is a way of validating and improving data quality. However, the quality of the data, its method of collection and low offline-to-online match rates limit the efficacy of this approach.

Marketers should also use good data hygiene algorithms. Easy-to-implement algorithms can identify and quarantine nonhuman data. It’s possible to detect visitors with an inhuman number of page views in a given period, for example, or visitors with clicks or conversions outside the standard deviation of the data set or a user accessing a site from IPs all over the world.

Probability scores offer another means of spotting nonhuman traffic. Profile known real human visitors and ad fraud robots to create probability models, which can then be used to assign scores to incoming visitors. Data from visitors that fall below a tolerance level can be blocked or ignored and the data quarantined.

At this point, these approaches aren’t easy, but at the same time, data-driven marketing is being pursued by organizations with varying levels of data sophistication and resources. As data-driven marketing matures, I expect more of these approaches to be incorporated into the different solutions offered by marketing vendors.

This is already happening, as can be seen by by comScore’s recent announcement that it is working with Turn and MediaMath, two vendors in Gartner’s Magic Quadrant for Digital Marketing Hubs, to embed its anti-ad fraud solution. This trend will only accelerate as marketing and ad tech vendors seek to make their offerings more attractive to marketers in smaller organizations.

Follow sovrn (@sovrnholdings) and AdExchanger (@adexchanger) on Twitter.

Tagged in: