Distinguishing Good Data From The Bad

Nish Desai headshot

Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Nish Desai, senior director of technology, operations and partnerships at Xaxis.

Marketers have ever-growing streams of data and signals they can use to activate and optimize their advertising campaigns. But using this data to execute well is only half the battle.

To hit their mark, brand marketers must make their data both reliable and extensible to as many places as possible so they can pair it with partners’ first-party data, then fill any gaps with properly vetted third-party data. This means they need to ensure its proper collection, storage and upkeep.

Brands should investigate data collection methodologies to determine the data sets are complete, consistent and representative of the segments they are trying to reach. Properly vetting data requires human interaction, not just a technological solution.

Key considerations

What are my objectives? Before building a data set, brands need to ask themselves why they are building it. Where will it be used and for what purpose? This may seem counterintuitive – starting at what seems like the end – but understanding the reason for the data set’s creation will help ensure that the data that goes into it is correct.

What data is available and where did it come from? Understanding how the available data was collected can help determine how it will be used and how much value it will bring to the data set.

Where and how is the data stored? Data can be stored in house or by a partner. Often, centralizing the data in a data lake may be in the brand’s best interest.

Knowing how your data is stored is equally as important as where it is stored. To get the most value from the data, it must be current. Knowing how often data is refreshed is vital.

As privacy regulations emerge in more markets, it is essential that all data is collected and stored in a privacy-compliant manner.

What data is missing? Once brands have identified what data is available, they’ll likely find gaps that need to be filled. Ask potential partners direct questions and listen carefully to the answers. Vague and unclear definitions are a warning sign.

How is the data collected? To determine how a partner knows its first-party data is accurate, a brand may ask how phone numbers, ZIP codes (preferably plus six digits) or email addresses are collected and tested. It would be a positive sign if the information comes from users’ self-declared registration data and there are 100,000 users in the data pool who match the segment desired by the brand. The confidence score for that kind of deterministic first-party registration data is generally higher than 90%. Having users proactively express who they are, what they like and consistently using a login boosts confidence that the data held by a platform or publisher is sound.

But brands should be cautious if the publisher or platform touts a complicated methodology used to deduce someone’s identity. For data derived by probabilistic means, the confidence level is nearly always below 85% and is often closer to 50%.

How are the data sets structured? Is the data merged with data from other parties? Understanding how data is structured after it is collected is extremely important. A brand needs to ensure that any partner data is in an apples-to-apples configuration with their own so that it can be easily merged. If other parties are involved in the sourcing of the data, the brand may need to inquire about their collection methods.

How is the data kept current? Any data based on interests or other attributes that can change over time should be refreshed periodically. Knowing how and how often these attributes are refreshed is key.

How is the data kept clean? If personally identifying information (PII) is collected, understanding how it is sanitized is essential. Is a clean room used to ensure that PII is removed? If values are being hashed or encrypted, understanding how this is done will help ensure that the brand is complying with the requisite privacy standards and industry best practices.

Don’t forget the other huge added benefit to making sure data is good: It helps brands better prepare for when third-party cookies are phased out.

Digital marketers preparing for that day know they have to build and test their first-party data stores to be as ready as possible. They need to have the best data they can and build on that to mix, match and build segments within the social platforms, Google’s Privacy Sandbox and Ads Data Hub, and to match with publishers’ data warehouses. Marketers and their partners need to use best practices in gathering and maintaining their data to keep data stores clean, accurate and current.

Follow Xaxis (@XaxisTweets) and AdExchanger (@adexchanger) on Twitter.

Enjoying this content?

Sign up to be an AdExchanger Member today and get unlimited access to articles like this, plus proprietary data and research, conference discounts, on-demand access to event content, and more!

Join Today!