TOM CHAVEZ: The shorthand we use internally is we call it CDUI: Cross-Device User Identification. We see a lot of our clients push us into this and we’re glad they have because we understand how strategic it is.
Do publishers have different needs than marketers?
VIVEK VAIDYA: The challenges are similar in the sense that marketers also want the ability to target users across devices. The most common use case we see is one of frequency capping. They want a true frequency cap, and not a cookie-based frequency cap. The challenge for marketers is they don’t have access to a data set to train algorithms that will come up with the CDUI intelligence we’re talking about. That’s where the service we provide comes into play for them. We can provide not just a global frequency cap in terms of reaching the same cookie across three different media execution systems; we can reach the same user on three different devices across three different execution systems. That’s the most common use case we see as far as cross-device user identification for marketers.
And then once you put in the framework to do true frequency capping you can start to consider true attribution, true people-based attribution, as opposed to cookie-based attribution across devices.
What’s the status of the cookie? Have the reports of its death been greatly exaggerated?
CHAVEZ: We agree people have jumped the gun on that one. It’s not clear to us that cookies are going away soon. But we (also shouldn’t) operate in an environment where it’s one at the exclusion of the other. We talk about the uber ID, that supra bookkeeping mechanism that lets us see and manage users across devices through the unique data signatures they leave behind. Cookies are an easy data signature that’s powered a lot of commerce and advertising on the Web to date, but it’s certainly not the only one. We’re preparing for all of the scenarios here, and are already mapping all of the users we see to that uber ID and seeing a lot of signal. But I think we’re in that horseless carriage: not a buggy, not yet a car. We’re going to be operating around both frameworks for a good while longer.
What are your publisher-marketer splits currently?
CHAVEZ: It’s about 80-20. We expect that to shift pretty aggressively in the next 18 months.
In the early paces of our company’s build-out, we were focused on publishers, and we’re moving aggressively into marketers. What’s been interesting for us to learn is marketers have the same concerns around protection, management and monetization that publishers have. So that’s been a very positive dynamic for us. Also, the market is fatigued by third-party data and searching increasingly for first-party differentiation. That’s an added benefit for marketers, if they can access proprietary data in a policy-managed way.
How does Krux identify users across devices?
CHAVEZ: There are two modes: deterministic case and the statistical approach. [Ed: Other DMP companies used the adjective “probabilistic.” This is a tomato-tomahto type of distinction.] The Telegraph is one of the publishers that’s early to the punch. They had a batch of named users and wanted to find them across the properties they control. In the deterministic case, they provide us with a key, usually email identification, that allows us to match users across, for example, mobile vs. desktop.
That’s deterministic because there’s no uncertainty whether we found a user uniquely identified by that key. It’s certainly a pattern a lot of our publishers have as they have subscription registration already. They’re looking to reach those users in a controlled way across all those screens. Marketers don’t want to carpet bomb the wrong users more than is effective or useful and it’s strategic for them to sell cross-device campaigns.
CHAVEZ: You don’t have a unique identifier that deterministically identifies a single user, but you have a data signature. That has a lot of characteristics or attributes of the user of interest. Some of that could be tied to IP address, it could be tied to a data signature around configurations in a particular browser. All of these bits of information are useful grist for the mill, allowing us to build a statistical profile of the user.
To what extent is this fingerprinting?
VAIDYA: A fingerprinting solution is not the same as a cross-device user identification solution. There are companies out there that sell fingerprinting solutions, but they’re really just a replacement for the cookie.
CHAVEZ: Fingerprinting is an easily misinterpreted term and it sounds sinister too. Fingerprints uniquely identify human beings. There’s nothing about browser configs that remotely have as much signaling power as an actual fingerprint. It’s just a reflection of how you’ve pimped out your browser, and that’s a useful piece of data you can use to build more reliable signals about individual users. But our approach is that’s one of many types of data signatures that you need to integrate into a more reliable identification of the user in a way that’s still sensitive to privacy and governance constraints.
How do you determine or improve the accuracy of the statistical approach?
CHAVEZ: The game of course is to boost that signal using the deterministic information you have. There’s a machine-learning aspect where you’re basically taking a seed and using that to infer more interesting things about a broader set of users who look, sound and act like the users you have in that seed.
What’s interesting for us and our customers is because we’ve amassed an interesting level of scale at Krux. We see over 1.5 billion users across the globe, that provides an interesting training set to power the statistical CDUI we provide.
VAIDYA: To add to what Tom said, in terms of data signatures we track, we use the IP address, the user’s device signature, their browsing patterns, where they browsed from, all of these different elements go into the [statistical] model. The model ends up computing a score that looks through cookies and says the similarity score between these two cookies is X. If X is past a certain threshold, we determine that those two cookies represent the same user across multiple devices.
The deterministic approach and the truth set are used as the training and test set for verifying the accuracy of the similarity score we computed. It’s a continuous feedback loop. As we get more data from our first-party registered data from our clients, it feeds the deterministic piece, which feeds the machine-learning piece and the cycle goes from there.
How big does truth set need to be?
CHAVEZ: We’ve bumped into some folks who are peddling similar technologies. We want to make sure we understand what other offerings enable. In a couple of recent cases, we know there are folks showing up with truth sets of 2,000 to 5,000 users. When you’re trying to reliably identify profiles for millions of users on the Web, there’s nothing useful with 2,000. You need a truth set that’s measured in the tens of millions to get a reliable signal or you’re wallowing in noise.
VAIDYA: Especially when you’re doing this across the globe.
CHAVEZ: You need to be broad and you need to be deep. Across the globe, you’re not duplicating and learning from the same kind of users. You actually need a broad enough footprint that lets you reliably identify users in lots of different geos, because that’s what our publisher and marketer clients are looking for. Our clients don’t want to just get into Minnesota moms, they want something that’s measured in multiple states across the Eastern seaboard. Or in the case of our European clients, multiple countries.
Isn’t the fundamental problem with many big data tools is that they can go broad or deep, but doing both is extremely difficult?
VAIDYA: This is why having a footprint, which lets us see 100% of the users 100% of the time, gives us the luxury of having a data set that is both broad and deep.
How do you build your solutions so they can handle various international privacy constraints?
CHAVEZ: We’re deep in Germany where the strictures are very high. Vivek and his team need to reach deep into the stack to accommodate those strictures. In Germany, the IP address is considered PII, so you have to strip away that part of the signal. It never crosses our frontier or penetrates our data collection engine. So we tune our data collection approaches to match the regulatory rules of the places we operate.
VAIDYA: In companies like Germany which has different interpretations around what constitutes PII, we had to rework some layers of our stack to account for that. It’s not just the data collection piece that’s important. They still want the benefits that come with the data processing that happens with that data, they just don’t want it to happen outside of the EU. So not only have we had to change how we collect that data, we’ve had to come up with ways to process that data within the EU, after we reapply all of these obfuscation techniques we’ve developed, so it’s processed in conjunction with all of the other data we have. This way we have one consistent data set.
Who generally handles these privacy issues? You or the client?
CHAVEZ: With most of our customers, they just want us to handle it. So we make it easy by having embedded all of the policy management Vivek described so they’re not on the wrong side of the line. We have multiple clients in the US who have properties whose websites get hit inside the EU. They just need to know Krux is handling the data collection, so they’re not at odds with European regulators.
There are follow-on tool sets we provide especially for the Europeans that have embedded opt-in, opt-out choice control. Sanoma for example is a publisher that uses these tools embedded into their infrastructure. You don’t want a solution that just pretends not to target, but can still see users even after they’ve opted out. So we go one step further and black out any information that would have otherwise been available after the customers opted out from Sanoma’s system. Sanoma never sees or has registered in their system a data signature a consumer in their region has chosen to black out. For us, that’s the difference between tracking and targeting. You have to provide the tool sets to make it easy for publishers to implement. You have to take that last mile step and weave those targeting instructions deep into your infrastructure. That’s the level of privacy and security the EU guys especially require.
VAIDYA: We build tools in several layers of our stack that control which level of PII setting we’re operating in. The rest of the infrastructure knows how to handle those settings in terms of what constitutes PII, what doesn’t constitute PII, what to log, what not to log, where to process stuff, and so on and so forth.
One of our earliest clients was in Europe so we had to build this infrastructure from the ground up very early on.