Machine Learning Needs Good Data To Reach Its Potential

“Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Chris Dobson, CEO at The Exchange Lab.

Machine learning is revolutionizing digital advertising, driving efficiency through advanced algorithmic decision-making, and could generate $42 billion in annual ad spend by 2021.

To get the best out of these technologies, however, marketers need to be aware of potential inaccuracies in the data they are using. They must also understand the limitations inherent in their data sources and figure out how to overcome them.

Using machine learning, marketers can theoretically optimize advertising campaigns to specific business outcomes, such as conversions, by creating a closed loop where performance insights are fed back into the decision-making process. Advanced algorithms determine which ad placements are most effective at driving the required results, and these insights can be used to uplift spend on the best performing sites and generate maximum ROI.

It’s important for marketers to remember that performance data used by machine learning can be inaccurate and incomplete. Algorithms can only optimize against the data they are given, so if this information is flawed, advertising campaigns won’t reach their full potential. Comparing data used by buy-side platforms to optimize campaigns, with the data used by advertisers to measure performance, reveals some significant discrepancies.

Demand-Side Platform View Vs. Ad-Server View

When a demand-side platform (DSP) buys ad space it is informed by multiple data streams, including insights on conversions and sales.

When consumers book a holiday online, for example, their confirmation page will contain a conversion pixel from the DSP to track the touch points that consumers have been exposed to along their paths to purchase, and those interactions will receive conversion credit. This data is then used by machine learning algorithms to determine which inventory to bid on for future placements and how much to bid.

However, the DSP is not the only provider to place tracking pixels on conversion pages. There will also be a pixel from the ad server, such as Google’s DoubleClick Campaign Manager, which is used by advertisers to measure campaign performance. While the DSP only sees its own digital activity and uses the limited data from that activity to feed its algorithm, the ad server has a more holistic view of all digital media activity, including search, other media buyers and the original DSP. This is why marketers should be cautious when relying too heavily on data from the DSP alone.

Dramatically Diverse Results

Comparing these two data sets – the more comprehensive attribution data from the ad server and the less complete performance data from the DSP – provides a picture of how accurate the data-feeding machine learning algorithms really are, and the differences can be remarkable.

In my experience, I’ve seen cases where the DSP and ad server may only agree the same impression is responsible for a conversion half the time. The DSP may record more desktop conversions than smartphone conversions, while the ad server may report the opposite. And in many cases, the DSP may record a conversion on a website where the ad server detects no conversions at all.

Assuming the ad server data is more accurate, due to its more complete overview of digital activity, these discrepancies mean machine learning algorithms are optimizing campaigns based on imprecise data, resulting in suboptimal performance. With the data on the two sides of the ecosystem only matched half the time, buy-side platforms may optimize against results that can be up to five times different from the volume of conversions an advertiser might record.

Factoring In Data Discrepancies

It would be a mistake for marketers to plan for or anticipate these inconsistencies by assuming there will always be a given percentage degree of inaccuracy in DSP data. The discrepancies vary according to campaign, so there is no way for marketers to extrapolate the data and produce a hard and fast rule.

The better approach then, is to look at the differences on an individual basis to determine how best to use the information available to truly optimize ad campaigns. To ensure machine learning algorithms are working effectively, marketers must stop relying on the limited data generated by DSPs and instead feed them with more complete data sets that consider activity across all digital touch points.

Marketers can use machine learning to increase the efficiency and effectiveness of their advertising campaigns, enhancing customer engagement and maximizing ROI, but the algorithms that drive it must be fed the right data. It could be damaging for brands to rely on incomplete or inaccurate data to inform campaigns, which therefore makes it vital that marketers use data on a case-by-case basis when optimizing. Where this doesn’t happen, marketers are potentially missing out on the exceptional opportunities machine learning has to offer.

Follow The Exchange Lab (@exchangelab) and AdExchanger (@adexchanger) on Twitter.

Tagged in: