“Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.
Today’s column is written by Martin Kihn, senior vice president of marketing strategy, Salesforce Marketing Cloud.
In late April, Google announced in-market tests for some of the proposals in its Privacy Sandbox, where the cookie-free web is being born. In a Github post about the tests, Google’s “RTB team” said it wanted to poke at the “viability of … proposals via small-scale real-world experiments conducted by exchanges and bidders.”
Still sketchy and short, the Sandbox proposals are debated in forums such as the W3C’s Privacy and Web Incubator Communities and its Improving Web Advertising Business Group. So far, these forums are dominated by highly credentialed, privacy-focused software engineers and not advertising boosters.
That’s why Google’s experiments are so important. They represent a tangible Phase 2 in the rapidly moving rollout of the post-cookie web. And the specific proposals in question point the way toward what that web may actually look like in 2022, when the last holdout – Google’s Chrome browser – finally empties the cookie jar.
In a phrase: It will be very different.
Shepherd of the FLoC
Among four Sandbox proposals singled out for testing, two are most relevant for ad buyers: “Federated Learning of Cohorts” (FLoC) and the colorfully named “Two Uncorrelated Requests, Then Locally-Executed Decision On Victory” (TURTLEDOVE). Both were proposed by Google engineers.
Federated learning is a technique that lets a bunch of different nodes – such as browsers or smartphones – build machine-learning models and upload parameters to a master model without sharing user-level data. In one application, Google used it to train smartphones to predict text messages – you know, the guess-the-next-word feature – while keeping individual texts on the phone.
In the FLoC version, each browser captures data on its users’ behavior: websites she visited, the content of those websites and her actions. That data is used to build a model whose parameters are shared with a master model on a trusted server. In this way, each browser can be put into a cluster (or “flock”) based on its user’s browsing behavior.
Flocks have random labels such as “43A7.” To use them for targeting, an advertiser would have to discover which flocks contain target customers and which do not. Armed with such info, the advertiser could bid appropriately on RTB exchanges when an impression with a particular flock label appears.
Some obvious questions: How many flocks will there be? And how do we decode the labels?
“How many?” is a statistical question with no easy answer. Given the scale of the web, many thousands are feasible without threatening anyone’s privacy. What the flocks mean is more ambiguous. In machine-learning terms, each flock is a cluster, so its definition is opaque. Flock labels could be semi-public information, similar to mobile IDs, shared with the websites we visit. Sites with a large number of visitors could analyze the behavior of individual flocks – perhaps using Google Analytics – and start to see patterns.
In a simple scenario, a retailer might notice high-end suit buyers tend toward flock “22H8,” while sale-priced sweat-suiters lean to “17C9.” If the correlation is strong, bidding strategies could be developed and campaigns be – ahem – tailored to either flock, or both, as the label is exposed in the bid stream.
As the author points out, there’s a challenge with sensitive data and what labels consumers will accept. And companies, publishers and ecosystems with more traffic will see more flocks and behaviors. The data-rich will get richer. It is easy to imagine a thriving market around identifying what flock labels mean. Those that point to, say, insurance buyers or swing voters could be very valuable to certain parties. Existing data management platforms could become a kind of phone book for FLoCs.
Flying with the TURTLEDOVE
Assuming some version of FLoCs passes into production, flock-level bidding might not be all that much different from audience buying today. After all, no advertiser runs a different campaign for each person; we always deal with aggregates. The biggest difference between 2022 and today is the end of user-level targeting.
Unfortunately, advertisers have come to rely heavily on user-level targeting for results. Those techniques hardest-hit by the end of the cookie will be difficult to replace:
- Frequency capping
- User-level attribution
TURTLEDOVE is an ingenious attempt to enable some form of retargeting and shows how the browser could subsume ad tech. Its main moves are to separate data about behavioral intent (what the user wants) and context (where the user is now); and to run the ad auction inside the browser itself.
As with FLoC, the browser is the sentinel and lockbox, watching what the user does and storing observations locally. Say a user visits Widgets.com. The browser will label that user in a Widgets “interest group,” based on her behaviors and will store that label; it can also pull information from the brand (Widgets Inc.), such as bids, bidding logic, ads.txt sellers and ad units – in short, everything needed to run a campaign.
Later, when that same browser appears on Pub.com (or an ad network), then Pub.com will send contextual data to the browser, which will run an auction and declare a winner. By separating the “interest group” from the context, neither the advertiser nor the publisher learns anything much about the person seeing the ad. At least, that’s the idea.
Challenges abound. For example, interest groups aren’t updated in real time (there’s a time lag, for privacy), so retargeting is less timely. Brand safety is difficult to enforce. Complex auctions and logic may be a burden. There will certainly be fewer “interest groups” than there are retargeting options today. How many is enough?
Answering these and other questions is the purpose of the experiment phase.
Two conclusions and a question
Where does this leave us in our attempt to foresee the web of 2022? Some conclusions are clearer than others, at this early stage:
- Personas: The future is aggregate, not individual. Tactics such as retargeting will have to be designed for larger cohorts, not at an item level. These cohorts will need detailed models of behavior and actual lifetime values. Bids will be based on better models of expected value.
- Customer data: Lacking third-party data, advertisers need another way to build personas. The answer – as everyone is telling you – is first-party data. Most advertisers are going to need more of it, collected with consent, both pseudonymous and known. They are going to need more partners willing to share special data sets. Otherwise, they’re going to have to be very good at market research, pay a premium and waste a lot of impressions. It’s difficult to see how big publishers and walled ecosystems with large data sets don’t win.
And a final question. Many of the Sandbox proposals rely on a “trusted server” (or brain) to act as coordinator and conductor. This server could hold the keys to wisdom and wealth. Who owns it? Is it Google?
That too may be a message in the Sandbox.