ERIC ROZA: There’s a real trade-off when it comes to accuracy and scale, but there’s a sweet spot – and that’s what we call relevant reach.
You might want to reach 25-to-45-year-old females, but you don’t want to reach 25-to-45-year-old females who just bought a brand new sports car if you’re selling Ford F-150s. We’re focused on combining demographics with rich signals in an area, and we’re looking to scale it geographically.
Reconciling reach with precision isn’t a new problem. Why hasn’t it been successfully addressed before?
We can we train our model around really precise – but smaller – data sets, like age and gender. Then apply that to the whole breadth of data.
For example, you might find a cookie tagged with female, female and male by three different demographic providers on BlueKai. What do you do?
Do you just flip a coin? Ignore it and serve it up both ways? Or serve it up as female because you’ve got two females to one male? None of those is the right answer.
You look at how well each of those providers has done historically. And you also look at what else you know about that cookie. If you know that cookie has also gone to automotive enthusiast sites and sites about weightlifting versus sites about shopping for women’s clothes, you can make predictions that are better than the original data set.
What do you need to build on to improve relevant reach?
There’s always [another big data asset] coming down the pipeline. For example, we were talking with Visa for over five years at Datalogix. About a year ago, we broke through with our partnership discussions. It became a really differentiating aspect in the eyes of their merchants. Now, Visa is a great partner with us in the data space.
So the more data assets that are created and liberated, the more value gets added to the ecosystem.
How does all this differ from what you used to do?
Five years ago, we’d match cookies with somebody. We’d have partnerships and pay someone to synchronize our cookies. About three years ago, we needed to verify for ourselves whether these cookies are who they say they are.
We’d bring in third-party truth sets and do triangulation. We’d find that with a certain provider, about 80% of the cookies they gave us are crap.
We also do the same thing with mobile ad IDs, and over the last six months we’ve started rolling out scored mobile ad IDs.
How do you find data partners?
We look for people with new signals to provide. Someone with an audience that’s very different in some way. Volume is great, but if we have 90% of the matches they offer, that’s not interesting to us.
And we also look to see if they have data on our existing IDs that can help train our models. I don’t think anyone else in the market is taking this algorithmic approach to things.
What do you mean by “taking this algorithmic approach to things?” I would have assumed others out there do that as well.
They’re just joining things. They’ll get 3 million IDs from this guy, another million from that guy, then they’ve got 4 million and they’ll put it out there.
That’s what we believe everyone in the market does, other than us.
We get 3 million from this guy, 1 million from that guy, then we put them together to figure out who’s right, when they disagree with each other, which ones we shouldn’t use at all. We score everyone and if there’s a conflict between two sources, which one should we use?
So how did you used to handle that sort of conflict?
We were doing what the rest of the industry is doing: working with sources we believed we trusted, doing our checking, getting our integrations as good as they could be and fulfilling audiences against them.
But we always knew we could do better.