Deterministic Data Isn’t What It Seems

Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Matt Keiser, founder and CEO at LiveIntent.

Metadata is necessary to wield identity with the same accuracy as the triopoly – Facebook, Google and Amazon. The triopoly has great metadata because it gives data a day job.

The triopoly knows that human beings are more than just one identifier. Our identities can be thought of as snowflakes – composed of the cookies, anonymized PII and mobile IDs tied to people at their core.

For example, I have multiple email addresses from school, work, broadband providers and more. I no longer use to register for anything, but since it is persistent, it can still be used to target me if linked to my current emails, cookies or mobile IDs. In my personal identity snowflake, and my work email are linked to many of the same cookies and mobile IDs because I log in to check email and use them for registering for different websites and apps, based on whether I use them personally or for work.

These multiple email addresses explain part of the reason marketers are sometimes disappointed in CRM onboarding results outside of the triopoly. Marketers often onboard to target a user based on identity, starting with an email hash that’s converted that to a cookie. But marketers measure attribution (and therefore success) by going from a cookie back to an email hash or comparing who registered against who was targeted.

When you follow the data, it becomes obvious that you need to look at clusters of data, not rows, to understand identity, since both my personal and work email addresses are true. This is how the triopoly connects the dots on identity.

Metadata: Key to building snowflakes

People-based marketing data models must go beyond a single cookie-to-hash pair or cookie-to-mobile ID pair and chart identity snowflakes, just like Facebook, Amazon and Google. The odds of a targeted cookie showing up at conversion are small. But an identity snowflake model alone can’t maximize performance because it doesn’t provide the signal needed to adjust the probability of driving the targeted event.

The triopoly model includes metadata about the different relationships within the identity snowflake to determine the strength between devices, browsers, apps and the person.

A deterministic cookie-to-person pairing that’s seen only once isn’t the same as one seen every day. Marketers need a way to score and differentiate the quality of connections in a graph, though the accuracy of the pair isn’t what truly matters: Outcomes matter.

If marketers want to drive conversions for $100 via retargeting, they could use identity snowflake metadata to drive results by modifying bids. They may bid 10 times the value on a cookie that logs in regularly from a pair purchased from a third party that’s only been seen once. As long as they’re right and the cookies convert 10 times as often, they’ve right-priced each bid and increased their likelihood to drive conversions.

It gets more complicated. My identity may be mapped to a dozen cookies across multiple devices, so only looking at IDs associated with one of my devices would create an inaccurate view of the customer journey because it will have gaps. But looking at all the devices mapped to my identity would cause expansion that’s equally inaccurate.

If targeting me yielded a conversion, does it matter which ID converted? Probably not, unless the goal of the campaign was to reach a specific device.

Using metadata properly allows marketers to scale up and down for accuracy and performance, based on what works for them. Cross-channel and cross-environment (on the same device) require that marketers tune their data model based on what drives results for them.

A one-size-fits-all graph does not give a marketer or marketing platform the same control that the triopoly asserts.

Deterministic and the ‘truth’ 

The concept of metadata signals changes how we think about deterministic and probabilistic data. Deterministic data is described as “truth,” and probabilistic data is considered the output of triangulation methods used when “truth” is not available.

A snowflake model like the triopoly’s will show a variety of pairings that, while formed deterministically, are hardly the truth to a marketer that wants to reach me. However, the creation of metadata introduces an additional type of critical – and distinct — probabilistic data to the triopoly’s identity calculus. This “new” probabilistic data underlying the metadata highlights which pairs are most accurate and best for direct targeting, as well as which are less accurate and usable for expansion or lookalikes.

While the identity snowflake concept undermines the orthodoxy of deterministic data as “the truth,” metadata offsets the efficacy loss driven by weak deterministic pairings within the identity snowflake. The best way to think about deterministic data within an identity graph is that its efficacy depends on the quality of this new form of probabilistic data that underlies metadata.

Metadata is a triopoly differentiator

Metadata allows the triopoly to wield identity with tremendous accuracy. Generating metadata requires visibility of the micro-events that build and refresh the basic data mappings within an identity graph.

By expanding the definition of an identity graph to include metadata, we also expand the definition of “signal.” Signal is defined as purchase behavior used for retargeting. In a world of people-based marketing and identity graphs, signal also includes “unpasteurized” information about which email addresses have opened emails or logged in, on which device and how frequently. This signal is the root of metadata. Facebook, Google and Amazon are potent collectors of metadata signal because their core businesses drive email traffic and user logins.

Targeting and measurement via snowflakes

Marketers need tools to prove attribution and measurement that tie together different environments on the same device or across devices and channels. Traditional attribution models outside of the walled gardens haven’t supported the complexity required to weave together attribution in an identity snowflake world.

Matching a targeted cookie with a cookie that converted doesn’t prove incrementality or illuminate the customer journey. Identity snowflake attribution requires “reversing” the flow of data against an identity graph, which is the opposite flow of traditional onboarding – going from online data back to durable identifiers – or outboarding.

The triopoly is excellent at outboarding: It’s how it outperforms and claims the lion’s share of attribution. Outboarding is also why the Salesforce integration with Google 360 is potentially so transformative – it will allow brands to map first-party data to a previously walled-off portion of the identity snowflake, drastically increasing their understanding of what’s happening behind the Google wall. The integration is the cloud’s first truly differentiated news at the intersection of marketing and advertising in some time.

The triopoly knows that all deterministic data isn’t created equal, and the customer journey – unique to each buyer – can’t be explained with a basic data mapping model. To copy the triopoly’s success, living and breathing metadata at scale is needed. Only a few players have it, and all the tech and co-ops in the world can’t overcome sparse metadata.

Follow Matt Keiser (@mrkeiser), LiveIntent (@LiveIntent) and AdExchanger (@adexchanger) on Twitter.

Enjoying this content?

Sign up to be an AdExchanger Member today and get unlimited access to articles like this, plus proprietary data and research, conference discounts, on-demand access to event content, and more!

Join Today!