Home Data-Driven Thinking Distinguishing Good Data From The Bad

Distinguishing Good Data From The Bad

SHARE:
Nish Desai headshot

Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Nish Desai, senior director of technology, operations and partnerships at Xaxis.

Marketers have ever-growing streams of data and signals they can use to activate and optimize their advertising campaigns. But using this data to execute well is only half the battle.

To hit their mark, brand marketers must make their data both reliable and extensible to as many places as possible so they can pair it with partners’ first-party data, then fill any gaps with properly vetted third-party data. This means they need to ensure its proper collection, storage and upkeep.

Brands should investigate data collection methodologies to determine the data sets are complete, consistent and representative of the segments they are trying to reach. Properly vetting data requires human interaction, not just a technological solution.

Key considerations

What are my objectives? Before building a data set, brands need to ask themselves why they are building it. Where will it be used and for what purpose? This may seem counterintuitive – starting at what seems like the end – but understanding the reason for the data set’s creation will help ensure that the data that goes into it is correct.

What data is available and where did it come from? Understanding how the available data was collected can help determine how it will be used and how much value it will bring to the data set.

Where and how is the data stored? Data can be stored in house or by a partner. Often, centralizing the data in a data lake may be in the brand’s best interest.

Knowing how your data is stored is equally as important as where it is stored. To get the most value from the data, it must be current. Knowing how often data is refreshed is vital.

As privacy regulations emerge in more markets, it is essential that all data is collected and stored in a privacy-compliant manner.

Subscribe

AdExchanger Daily

Get our editors’ roundup delivered to your inbox every weekday.

What data is missing? Once brands have identified what data is available, they’ll likely find gaps that need to be filled. Ask potential partners direct questions and listen carefully to the answers. Vague and unclear definitions are a warning sign.

How is the data collected? To determine how a partner knows its first-party data is accurate, a brand may ask how phone numbers, ZIP codes (preferably plus six digits) or email addresses are collected and tested. It would be a positive sign if the information comes from users’ self-declared registration data and there are 100,000 users in the data pool who match the segment desired by the brand. The confidence score for that kind of deterministic first-party registration data is generally higher than 90%. Having users proactively express who they are, what they like and consistently using a login boosts confidence that the data held by a platform or publisher is sound.

But brands should be cautious if the publisher or platform touts a complicated methodology used to deduce someone’s identity. For data derived by probabilistic means, the confidence level is nearly always below 85% and is often closer to 50%.

How are the data sets structured? Is the data merged with data from other parties? Understanding how data is structured after it is collected is extremely important. A brand needs to ensure that any partner data is in an apples-to-apples configuration with their own so that it can be easily merged. If other parties are involved in the sourcing of the data, the brand may need to inquire about their collection methods.

How is the data kept current? Any data based on interests or other attributes that can change over time should be refreshed periodically. Knowing how and how often these attributes are refreshed is key.

How is the data kept clean? If personally identifying information (PII) is collected, understanding how it is sanitized is essential. Is a clean room used to ensure that PII is removed? If values are being hashed or encrypted, understanding how this is done will help ensure that the brand is complying with the requisite privacy standards and industry best practices.

Don’t forget the other huge added benefit to making sure data is good: It helps brands better prepare for when third-party cookies are phased out.

Digital marketers preparing for that day know they have to build and test their first-party data stores to be as ready as possible. They need to have the best data they can and build on that to mix, match and build segments within the social platforms, Google’s Privacy Sandbox and Ads Data Hub, and to match with publishers’ data warehouses. Marketers and their partners need to use best practices in gathering and maintaining their data to keep data stores clean, accurate and current.

Follow Xaxis (@XaxisTweets) and AdExchanger (@adexchanger) on Twitter.

Must Read

Google Ads Will Now Use A Trusted Execution Environment By Default

Confidential matching uses a TEE built on Google Cloud infrastructure to create an isolated computing environment for ad targeting and measurement. It will now be the default setting for all uses of advertiser first-party data in Customer Match.

In 2019, Google moved to a first-price auction and also ceded its last look advantage in AdX, in part because it had to. Most exchanges had already moved to first price.

Unraveling The Mystery Of PubMatic’s $5 Million Loss From A “First-Price Auction Switch”

PubMatic’s $5 million loss from DV360’s bidding algorithm fix earlier this year suggests second-price auctions aren’t completely a thing of the past.

A comic version of former News Corp executive Stephanie Layser in the courtroom for the DOJ's ad tech-focused trial against Google in Virginia.

The DOJ vs. Google, Day Two: Tales From The Underbelly Of Ad Tech

Day Two of the Google antitrust trial in Alexandria, Virginia on Tuesday was just as intensely focused on the intricacies of ad tech as on Day One.

Privacy! Commerce! Connected TV! Read all about it. Subscribe to AdExchanger Newsletters
A comic depicting Judge Leonie Brinkema's view of the her courtroom where the DOJ vs. Google ad tech antitrust trial is about to begin. (Comic: Court Is In Session)

Your Day One Recap: DOJ vs. Google Goes Deep Into The Ad Tech Weeds

It’s not often one gets to hear sworn witnesses in federal court explain the intricacies of header bidding under oath. But that’s what happened during the first day of the Google ad tech-focused antitrust case in Virginia on Monday.

Comic: What Else? (Google, Jedi Blue, Project Bernanke)

Project Cheat Sheet: A Rundown On All Of Google’s Secret Internal Projects, As Revealed By The DOJ

What do Hercule Poirot, Ben Bernanke, Star Wars and C.S. Lewis have in common? If you’re an ad tech nerd, you’ll know the answer immediately.

shopping cart

The Wonderful Brand Discusses Testing OOH And Online Snack Competition

Wonderful hadn’t done an out-of-home (OOH) marketing push in more than 15 years. That is, until a week ago, when it began a campaign across six major markets to promote its new no-shell pistachio packs.