Home Data-Driven Thinking How Self-Supervised Learning In AI Can Reduce Reliance On User Data

How Self-Supervised Learning In AI Can Reduce Reliance On User Data

SHARE:
Melinda Han Williams, Chief Data Scientist at Dstillery

The “P” in “ChatGPT” is perfect for cookieless targeting.

Here “P” stands for pre-trained. It’s an aspect of the latest generation of AI models that deserves a closer look from programmatic advertisers and agencies, along with the concept that powers it: self-supervised learning. 

Self-supervised learning is at the heart of generative AI, and it’s perfectly suited to address the signal loss we’re increasingly facing in digital advertising today. 

Using self-supervised learning, AI targeting models can build up a body of knowledge about digital behavior that lets the AI do more with less data when targeting ads for a specific brand.

Pre-training lesson

Pre-training is when an AI model learns a foundation of knowledge before it ever tries to tackle an individual prompt. AI models like ChatGPT learn by predicting the next word in a sentence for millions of sentences pulled from the internet. Before you ask ChatGPT your first question, this pre-training step has already allowed the AI model to build up an enormous breadth of knowledge, enabling it to give a surprisingly detailed answer in response to any little prompt.

This is an example of what’s called self-supervised machine learning. Self-supervised learning is when an AI model learns from a data set that doesn’t include labeled examples or other explicit guidance on what the AI model should learn. By using any readily available text, ChatGPT supervises itself to learn what each word means. It does this by guessing the next word in a sentence, then checking the answer and correcting itself. It plays this game of guess-and-check millions of times over.

The self-supervised nature of AI models like ChatGPT is what makes them so useful. The key to their success is that they can learn from data that’s already plentiful – data that wasn’t created specifically for training AI. 

This differentiates self-supervised learning from classic supervised learning. With supervised learning, if you want your AI to learn something, you need a data set with specifically labeled examples to guide the learning process. This data is always limited. Creating or acquiring it often comes at a cost. (There is an episode of “Silicon Valley” in which a character tries to trick a classroom of college students into labeling images as a homework assignment so he can use them to train an AI image identification system.) Self-supervised learning removes that barrier. The AI model can learn from data that’s already out there, without any special labels. 

Self-supervised learning enables pre-training an AI model on massive amounts of general-purpose data. That way, it can bring a ton of knowledge to the table in response to a specific prompt.

Applying self-supervised learning to programmatic advertising

Subscribe

AdExchanger Daily

Get our editors’ roundup delivered to your inbox every weekday.

What does self-supervised learning have to do with signal loss in programmatic advertising?

At the core of the signal loss problem is this question: How do you make an ad-targeting decision without the wealth of user-specific data we’ve all been spoiled with? Using just the information about the impression moment itself, like URL, time of day and DMA, how do you answer the question, “How valuable is this impression to this brand’s campaign?”

This is where pre-training comes in. Like ChatGPT, programmatic advertising needs AI models that harness their own foundation of knowledge in response to these impossibly tiny prompts. But, in this case, the knowledge we need isn’t about language; it’s about digital behavior. Rather than pre-training on sentence data, an ad targeting AI model would need to pre-train on the patterns and nuances of digital behavior. 

Digital journeys from opted-in panels, collected outside of the advertising ecosystem, are ideal for this self-supervised learning task. Where ChatGPT learns by predicting the next word in a sentence, an ad targeting AI model can learn by predicting the next website in a digital journey. 

As with other self-supervised learning processes, this allows the AI model to learn an impressive breadth of knowledge. In this case, the AI model learns what a visit to each website means, the intent behind each visit and that visit’s role in someone’s digital journey.

This allows the AI model to bring a wealth of knowledge in response to that tiny prompt – “How valuable is this impression moment to this brand’s campaign?” – and produce an impressively accurate response.

Do more with less data

The industry faces plenty of uncertainty as we prepare for the deprecation of third-party cookies in 2024. But one sure thing is advertisers will need to do more with less user data. 

Pre-training with self-supervised learning provides a way for AI models to bring their own foundation of knowledge to the table, so that less data is needed to make each buying decision within a campaign. It’s an approach with the potential to eliminate the reliance on user-level data for effective targeting. That makes self-supervised learning a rare technology that can simultaneously support both consumer privacy and advertiser effectiveness.

Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Follow Dstillery and AdExchanger on LinkedIn.

For more articles featuring Melinda Han Williams, click here.

Must Read

Comic: He Sees You When You're Streaming

IP Address Match Rates Are a Joke – And It’s No Laughing Matter

According to a new report, IP-to-email matches are accurate just 16% of the time on average, while IP-to-postal matches are accurate only 13% of the time. (Oof.)

Comic: Gamechanger (Google lost the DOJ's search antitrust case)

The DOJ And Google Sharpen Their Remedy Proposals As The Two Sides Prepare For Closing Arguments

The phrase “caution is key” has become a totem of the new age in US antitrust regulation. It was cited this week by both the DOJ and Google in support of opposing views on a possible divestiture of Google’s sell-side ad exchange.

create a network of points with nodes and connections, plain white background; use variations of green and grey for the dots and the connctions; 85% empty space

Alt Identity Provider ID5 Buys TrueData, Marking Its First-Ever Acquisition

ID5 bought TrueData mainly to tackle what ID5 CEO Mathieu Roche calls the “massive fragmentation” of digital identity, which is a problem on the user side and the provider side.

Privacy! Commerce! Connected TV! Read all about it. Subscribe to AdExchanger Newsletters

CTV Manufacturers Have A New Tool For Catching Spoofed Devices

The IAB Tech Lab’s new device attestation feature for its Open Measurement SDK provides a scaled way for original device manufacturers to confirm that ad impressions are associated with real devices.

Comic: "Deal ID, please."

The Trade Desk And PubMatic Are Done Pretending Deal IDs Work

The Trade Desk and PubMatic announced a new API-based integration for managing deal ID campaigns built atop TTD’s Price Discovery and Provisioning (PDP) API, which was announced earlier this year.

How Agentic Advertising Platform Aimy Uses Comcast’s Universal Ads API

On Monday, Brand Networks announced that Universal Ads would now be buyable through the company’s agentic ad buying platform, Aimy Ads.