Take A Deep Breath And Consider The Benefits Of Google’s Topics API

“Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Ruben Schreurs, group chief product officer at Ebiquity.

All aboard the “Topics API sucks” bandwagon!

Ever since the blog post by Vinay Goel, product director for the Privacy Sandbox, announcing Google’s Topics API proposal went live last week, my channels have felt like an industrywide echo chamber filled with Google bashing.

I understand that people are upset about recent revelations and evidence in lawsuits against Google. But I feel that heightened emotions on these other matters is impacting a pragmatic assessment and scrutiny on this specific topics concept.

I’m relatively late to the party, as I spent several days reading through the documentation, tuning into the debate happening around the concept and contemplating the pros and cons – and even now, I feel comfortable sharing only a provisional opinion on its merit. It’s not as straightforward as some are making it seem, because many key questions remain unanswered, but here goes nothing:

I actually like the Topics API concept.

There, I said it.

It’s important to emphasize the word “concept,” because that is very much what it still is at this stage. Google is actively engaging with anyone that has an opinion to try and answer open questions and help it make a decision about key features and limitations of the API, such as:

Should sites be able to set their topics, or should topics be determined by the browser or some third-party entity?
What should happen if a site disagrees with the topics assigned to it by the browser?
What topic taxonomy should be used? Who should create and maintain it?
What standard might be used for determining which topics are sensitive?

Why is this important? Because Google has acknowledged the flaws in FLoC and is trying to shape a utility that still allows for some form of interest-based advertising, but without the privacy issues and myriad other issues connected to the use (and misuse) of third-party cookies.

The Topics API is not finalized, and everything is subject to change as Google incorporates ecosystem feedback and iterates on it. Where many seem to disregard this feedback-gathering process as window dressing, I’m willing to give Google the benefit of the doubt and plan to allocate resources to contribute to the decisions that need to be made.

It’s going to be a long road. But let’s break down the concept of the Topics API as it currently stands.

The Topics API aims to provide an interest-based targeting utility to any “callers” on a webpage. What does this really mean?

Third parties (as in non-Google companies) will be able to receive several “topics” that a website visitor may be interested in based on their browsing history of the previous three weeks. For example, if I visited foodfordogs.com last week, the “pets & animals/pets/dogs” topic may be made available to advertising companies when I visit CNN.com, which would then enable a dog food company to bid on serving me an ad.

The brand would not know who I am or have any further profiling data on me, but it can use the topic to increase the likelihood of its ad being relevant to me.

How are topics assigned to a user’s browser?

The current technical documentation states that this will be done based on “hostnames” – which is a very important point. Hostnames link to the top-level domains of a webpage, such as example.com or sports.example.com. Hostnames do not, however, provide any further information from the full URL string, meaning we cannot use hostnames to differentiate between example.com/sports and example.com/finance.

This is critical, because many people seem to misunderstand the way topics will be assigned to a user’s browser – for example, by scraping the page contents of websites a user has visited or by analyzing data from email contents or search strings.

Google aims to link website hostnames to topics, which can be zero or several – there is currently no fixed limit – although the expected range is between one to three topics per hostname.

The one glaring question is whether websites should be allowed to set – or override – the topics they are assigned. Luckily, Google acknowledges this and is asking for feedback and suggestions from the industry. This needs to be dealt with, because websites could theoretically manipulate which topics are placed and spam the API with the most valuable topics without actually hosting related content.

Which topics can be assigned?

Again, Google is asking for industry involvement and even states that “the eventual goal is for the taxonomy to be sourced from an external party that incorporates feedback and ideas from across the industry.” The current concept taxonomy can be found here and includes 349 different topics.

Built-in transparency and the open-source nature of decision-making about the taxonomy and the models used to assign topics to website hostnames is critical and could enable a robust way to allow for interest-based targeting in advertising without exposing users to privacy and data protection risks. The taxonomy will be curated and made available for audits and users will be able to control and change the topics assigned to them or opt out of the Topics API entirely.

How are topics used to target based on interests?

Very simply put, the Topics API will return up to three distinct topics from a user’s browser. The topics will be three weeks old at most and generated based on the hostnames of websites visited by the user.

Every week, five “top topics” will be calculated using local browser information, meaning this does not happen on some obscure cloud server outside the user’s control. The idea is to randomly assign an additional sixth topic in order to introduce “noise” that makes it even more difficult to fingerprint users by creating and tracking distinct combinations of topics and linking them back to individual users.

The top five topics are to be selected based on a week’s worth of accumulated Topic IDs for “eligible” visits (i.e., websites that used the API and users who have not opted out of individual topics or the entire Topics API). From all these topics, the top five that occur most frequently based on a ranking system will be selected and, together with the randomized topic, make up a user’s list of topics for that week.

There may be a weighting model that influences ranking, for example, to ensure that more granular topics are considered as a way to add value. This information and the weighting methodology will either be made public or perhaps even built and operated by an external partner. All topics and top topic lists will be deleted after the third week to ensure a level of relevance through recency and to prevent long-term buildup of profile data.

It’s worth noting that any third party that calls the API will only ever be provided with topics that have been added to the user’s browser on a website where this third party was also present. If, for instance, I visited foodfordogs.com last week, but ad tech company X does not have its technology on the website, it will not receive the “pets & animals/pets/dogs” topic when it calls the API for my topics on CNN.com.

Compared to what we do now, this sounds like a very inaccurate way to profile and target individuals with personalized advertising.

Yes, that’s right! Because targeting individuals by profiling and tracking them across the web via third-party cookies or (most) alternative ID solutions is flawed and rarely (if ever) compliant with active privacy regulations.

Third-party cookies will cease to exist, and I have yet to see a valid alternative ID solution that can sustain the accustomed level of profiling and targeting in a compliant way. It’s time to wake up and smell the coffee. The system that has been used for years, even after the introduction of the GDPR and other similar regulations, is nearing its end.

And, frankly, I’m surprised it lasted this long.

You should consider the Topics API concept as a way to have at least some method for targeting users based on interests across different websites. Without the Topics API or a different compliant, safe and fair alternative to consumers, there will be no way to add relevance to online advertising beyond contextual targeting or first-party audiences operated by publishers.

Maybe that wouldn’t be a bad thing, either. I haven’t been all that impressed by so-called alternative identity solutions that use covert fingerprinting techniques or hashed emails as identifiers. But I see the Topics API initiative as a potentially viable, safe and valuable way to sustain a level of relevant targeting.

So, how is the Topics API different from FLoC?

What bugs me is that so many people position the Topics API as a reskinned version of FLoC, which I inherently disagree with.

Based on the current documentation, it’s a distinct concept with a much stronger emphasis on human curation, privacy safeguards and controls for the end user. And, contrary to FLoC in its initial experiment, generating a user’s topics is only possible when websites actually implement and use the API.

The main difference – and advantage – is the focus on preventing fingerprinting. Based on the current concept, it would be nearly impossible to create distinct user identifiers based on a person’s set of assigned topics. Even so, there are still certain risks and considerations around this, some of which are outlined here.

Well, what do we do now?

Scrutinize and contribute. Don’t shout from the sidelines. Get involved and influence the decision-making.

Yes, there are valid concerns and objections about anticompetitive behavior from Google, and I, too, eagerly await more information and judgment on the live lawsuits against Google.

But, provisionally, I think the Topics API initiative may be the start of something that could work, and the intentions and rationale seem genuine – to the extent that I can judge at this stage. I hope Google will remain committed to being fully transparent about the mechanics, modeling and infrastructure that will be built to support this. I hope Google builds in user access and controls by design. I hope Google lives up to its stated commitments to work with external partners.

And, lastly, I hope Google doubles down on its responsibility to help eradicate sensitive categories and harmful ways to target people.

Follow Ruben Schreurs (@RubSchreurs) and AdExchanger (@adexchanger) on Twitter.

Tagged in: