The Need For Clean Data, And How To Get It

“Data Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Mark Zomick, managing partner of data integration at MEC North America.

As the media landscape becomes more fragmented, sophisticated and ROI-driven, the need for clean, complete and accurate data has grown more critical than ever. Big marketers now drive most of their budgets through some sort of market-mix modeling, making clean data the price of entry. Yet the important task of ensuring data quality within operational and business-intelligence applications is frequently overlooked or relegated to a low-level IT staffer.

Today’s fast-paced and quickly evolving media landscape has become a Wild West of sorts, in terms of how delivery is reported — and, therefore, how the data impacts media planning and buying. Meanwhile, researchers need accurate, consistent data in order to assign ongoing value. We might be eager to “try something new,” but we do require some tangible reporting of the deliverable in order to measure it.

Here are four of the top problems in audience-level reporting today:

1) Lack of infrastructure

Problem: Data vendors can easily convey metrics as ad-hoc reports, delivered as Excel files. But most companies aren’t able to convert this data into a consistent, repeatable, API-like structure. At fault is a lack of attention to detail, in which errant commas or minor naming-convention differences in column headers cause automated processes to fail. You and I might know that “Columbus,” “Columbus OH” and “Columbus, OH” are the same thing, but the data process does not.

Solution: It’s important to hold data partners accountable. Even minor errors should be called out to ensure better practices and to help data providers manage data accurately. Pushing vendors to improve their data-delivery formats is vital.

2) The process works, so why check it?

Problem: “It worked fine for months. I don’t know why it stopped working.” Agencies, tech vendors and marketers are familiar with these calls, whether they’re making them or receiving them. Instead of fixing the process when it’s broken, it’s better to put basic checks in place to ensure that regular files are delivered as expected. A review of file size, expected values and the presence of “null” or no data are simple checks that need to be performed regularly on the outbound data feeds. Data reported as “null” or “n/a” is not the same as data reported as zero.

Solution: Agencies should insist that data providers check outbound files to reduce delays and extra agency work.

3) Distribution changes

Problem: If your cable system expands from 50 channels to 150 channels — or a vendor adds more mobile carriers to its mobile reporting system — researchers need to know so they can account for these changes in their analyses. Without this knowledge, they could mistakenly attribute the changes to popularity growth, rather than to increased distribution. Understanding how media is delivered and how the data is collected are keys to understanding how to measure the results.

Solution: Marketers need to be informed of distribution changes, as well as the expected impact of those changes, before they are made. Distribution channels should always start with a “base,” on top of which they can build. Like magazines’ “rate base,” this allows advertisers to monitor and understand the incremental value of the channel.

4) Changing methodology

Problem: As channels emerge and mature, it’s essential to alter data-collection methodology. But the effects of these changes in data collection — and the way vendors articulate these changes — can create a huge challenge for companies that use the data. Accounting for changing methodology is the biggest hurdle to data quality, but perhaps also the most nuanced.

For example, say you’re an advertiser using a system of spiders — tasked to complete a wide array of surfing — to monitor Internet activity. The spiders can be programmed with various demographic characteristics to mimic the surfing patterns of multiple groups of users. What they can’t do, it turns out, is simulate location. If you’re targeting customers based on their IP addresses, then the spiders will pick up consumers’ activity only if they’re within the target IP range.

Now imagine that data collectors, having discovered this flaw in their systems, have begun to place local monitoring stations around the country to get a better read on consumer activity. That’s exactly what they should do to solve this problem. But if they don’t give marketers complete visibility into this methodology change, those advertisers — or their agencies — could easily mistake the resulting change in data as a change in customer activity. Simply put, the local monitoring would make it seem as if there is an increase in activity and spend, even if there isn’t.

Solution: First, it’s essential for data providers to test collection-methodology updates in advance and to be completely transparent about them. This seems simple enough, but is rarely done well. Data providers should communicate with their partners frequently to keep them abreast of these changes. Additionally, consider creating multiple data streams, leaving one as the “original” and adding the new one in parallel. This way, advertisers can (at least for a while) make a like-for-like comparison.

In A Nutshell: Work Closely With Your Partners

The good news is that all of these problems have easy solutions. Working together in partnership, vendors and their customers can come together on best practices. Most importantly, data providers, don’t be afraid to tell your partners the truth. If you find an error and need to restate the data — or find a flaw in your projection methodology — tell your partners. Open communication and transparency between vendors and customers serve everyone.

Tagged in: