Home Data-Driven Thinking When Evaluating Cross-Device Graph Technology, Look Beyond Match Accuracy

When Evaluating Cross-Device Graph Technology, Look Beyond Match Accuracy


RajivMaheshwariData-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Rajiv Maheshwari, cross-device technology leader at Neustar.

With consumers increasingly accessing content and shopping via multiple devices, multiscreen and cross-device identity have become critical to advertisers.

It offers a unified view of individual consumers as they interact with brands’ advertising across multiple devices and platforms. A unified view of the consumer opens the door to cross-device marketing, multitouch attribution, closed-loop reporting, unique reach and frequency measurement and opt-out compliance, among other desirable capabilities.

Industry giants with large user bases across mobile devices and desktops, such as Apple, Google and Facebook, have a clear advantage because users identify themselves by logging in via a single ID across all platforms. These companies have created so-called “walled gardens” offering deterministic identities at scale, potentially grabbing the lion’s share of advertisers’ spending.

Other companies in the ad tech ecosystem need a formidable alternative solution to compete. Several vendors have emerged over the last few years to fill the void with probabilistic cross-device matching technology that links browser cookies and device IDs to each user.

The vendors’ aggressive marketing has largely driven the conversation toward match accuracy of clustered identities in their “device graph.” Vendors have claimed accuracy ranging from 70% to 97%. But what they are really talking about is precision, incorrectly defined as accuracy.

I’ve had to evaluate several device graph technologies over the last year. I’ve found that, in general, the level of sophistication in currently available solutions on the market is still low compared to other successful applications of machine learning technologies, such as email spam filtering, recommendation engines, face recognition or fraud detection. Here are some of the criteria I’ve learned to consider.

Precision And Recall

Precision is the percentage of clustered identities in the device graph that are truly linked to the same individual. Recall, on the other hand, is the percentage of all existing user identities that are clustered in the device graph.

For example, say a given user has five different IDs across multiple browsers and devices, which I’ll call A, B, C, D and E. If IDs A, B and F – some other user’s ID – are clustered in the device graph, the device graph’s precision is 67%, since two of the three clustered IDs are correct. However, the recall is only 40% since only two of the IDs are correctly clustered out of five total IDs. Lower precision can yield higher false positives, while lower recall indicates higher false negatives.


AdExchanger Daily

Get our editors’ roundup delivered to your inbox every weekday.

Depending upon your target use cases, you may prefer higher precision to recall or vice versa. For example, higher precision is desirable if marketers want to retarget an audience with sequential messaging. Higher recall is desirable if the goal is to increase audience reach by acquiring new screens. Some vendors may also provide the ability to adjust precision vs. recall via a cluster affinity score for IDs. Increasing recall typically also increases scale.

Partial vs. Fully Clustered Cross-Device Identities

Ideally, each cluster in the device graph should have all the IDs linked to the same individual. From our previous example, IDs A, B, C, D, E and F would constitute a full cluster. However, the vendor’s device graph may provide only pairwise or partial clusters with IDs spread across multiple clusters as illustrated by the following ID tuples:

  • [A, B, F]
  • [B, C]
  • [D, E]
  • [B, E]

Many of the cross-device business use cases, such as multitouch attribution, depend on accurately assembling the entire user events chain. It is a lot simpler to stitch together users’ journeys across multiscreen touch points with fully clustered cross-device identities. Assembling user events chains from partially clustered identities is computationally intensive when dealing with billions of user events. Partial identity clusters are only a partial solution to the cross-device identity problem.

Scale Of Clustered Cross-Device Identities

Vendors often tout that they have more than a billion IDs in their device graph. However, what matters from a cross-device perspective is how many of those IDs are clustered. Probabilistic and deterministic matching can only tell which IDs are linked to same individual.

Standalone ID does not necessarily imply that the corresponding user has only one digital ID in the universe. In that respect, standalone IDs in device graph are about as important as IDs that are not in the device graph, meaning they provide no additionally useful information. So although a vendor may have more IDs in its device graph, many may be useless.

Individual And Household-Level Hierarchical Clustering

Finally, if your marketing goals require both individual and household-level granularity, it may be a good idea to ask your vendor if they can support hierarchical clustering of individual-level cross-device identity clusters into households. There are several algorithms that perform hierarchical clustering. Redesigning the algorithms to perform at big data scale is difficult but certainly achievable with currently available technologies.


I find it encouraging to see growing interest and momentum in cross-device identity solutions. With several readily available machine learning software libraries and tools, the entry barrier is set fairly low.

On the flip side, there are significant data science and engineering challenges to overcome. A comprehensive solution would also need access to online, mobile and offline identity data points. Hopefully, increased competition will drive innovation in this space.

Follow Neustar (@Neustar) and AdExchanger (@adexchanger) on Twitter.

Must Read

Nope, We Haven’t Hit Peak Retail Media Yet

The move from in-store to digital shopper marketing continues, as United Airlines, Costco, PayPal, Chase and Expedia make new retail media plays. Plus: what the DSP Madhive saw in advertising sales software company Frequence.

Comic: Ad-ception

The New York Times And Instacart Integrate For Shoppable Recipes

The New York Times and Instacart are partnering for shoppable recipe videos.

Experian Enters The Third-Party Data Onboarding Business

Experian entered the third-party data onboarder market on Tuesday with a new product based on its Tapad acquisition.

Privacy! Commerce! Connected TV! Read all about it. Subscribe to AdExchanger Newsletters

Albertsons Takes Its First Steps Into Non-Endemic Advertising, Retail Media’s Next Frontier

Albertsons is taking that first step into non-endemic advertising next week via a partnership with Rokt to serve ads to people who have already purchased groceries.

Marketecture Buys AdTechGod (No, Really)

Marketecture has acquired AdTechGod – an anonymous ad tech Twitter poster turned one-man content studio – and the AdTech Forum, an information resource hosted by AdTechGod and Jeremy Bloom.

Why The False Advertising Lawsuit Against Poppi Is Bad News For RMNs

This week’s dispatch explores the new trend of false advertising class-action suits in the food and CPG industry and how the evolution of online, data-driven retail media could exacerbate the problem.