Home Data-Driven Thinking RAG Against The Machine: Injecting First-Party Data Into AI Models For Better Results

RAG Against The Machine: Injecting First-Party Data Into AI Models For Better Results

SHARE:
Hugo Loriot, Head of Data & Technology Integration at The Brandtech Group

First-party data has long been in a marketer’s toolkit as a critical instrument to personalize the customer experience across media touch points. But it has yet to influence how most companies use generative AI technology. 

However, proprietary data sets have the potential to play an important role in several key marketing applications for generative AI today. That includes content production at scale, campaign insights generation and customer service. 

The technology and process already exist to take advantage of large collections of structured and unstructured data in the form of retrieval augmented generation pipelines (RAG). RAG is a recently developed process to ingest, chunk, embed, store, retrieve and feed first-party data into foundation models such as the ones provided by OpenAI, Google and Meta.

Here’s how to use these tools to inject first-party data into your next AI-enabled campaign.

It’s all about context

Whether you use GPT, Dall-E, Imagen, Gemini or Llama, each time you ask a generative AI application to produce text or imagery, you provide (primarily text-based) instructions. These instructions are sent to the model to generate the desired response (what is called an inference). The result is either an insight, an ad copy or a creative.

Every marketer and agency can benefit from Google and OpenAI’s petabytes of training data to do their job more effectively. There is a trick, though. Foundation models are not retrained often (maybe every other year). That means they are not always up to date. And while these models may appear omniscient, they are actually clueless about information that is specific to your brand and your customers. 

Why? Because this information is not publicly available. 

If you ask Dall-E or Imagen to adapt a master creative to each of your five key audience groups, they will certainly fall short and potentially hallucinate, because they have no idea who your audience is and what they might be interested in. 

The ability to contextualize a prompt with first-party data and knowledge is critical, not only to differentiate yourself from the crowd but also to help generative AI models provide useful results. If you add to your prompt audience-specific data (such as demographics and preferences), then you will get on-brand, customized creative at a fraction of the cost. 

Similarly, asking Gemini or GPT to write an Amazon product description between 500 and 800 characters for a new product will not give great results, unless you add examples of existing high-performing copy for other products in your catalog. This process is known as few-shot prompting. 

RAG time

The ability to identify and extract proprietary knowledge and data are key to generating successful content. But what needs to be done to feed first-party data into a large language model (LLM)?

For generative AI, data management involves different technologies and pipelines than those that are used with other mar tech applications, such as customer data platforms. That’s because of the broader nature of first-party data and how it is consumed by LLMs. 

In traditional mar tech, when you want to address your “health-conscious moms” customer segment, you select all the rows in your database that return both “moms” and “health conscious” attributes. You export the resulting list of hashed emails to the target media platform. 

But in the age of generative AI, marketers need a RAG pipeline to ingest, chunk and embed rich information about the “health-conscious moms” segment. This data includes social listening, qualitative panels and customer reviews, as well as creative features that have historically performed well against that group. 

Chunking and embedding may not be familiar terms to advertisers. These concepts represent breaking down long, unstructured text files and multilayered images into smaller homogeneous pieces (chunking) and translating these pieces into a mathematical multidimensional representation (embedding). These processes make it possible to inter-operate the data with an LLM. 

When using a generative AI application, the prompt that is sent to the foundation model to create a creative variant that is optimized for health-conscious moms will be enriched by the RAG pipeline. The most relevant pieces of knowledge extracted from the brand’s first-party data universe, and the resulting creative, will be tailored to this audience group with data points specific to the brand.

This new paradigm presents an opportunity for brands that were traditionally considered data-poor because they didn’t have the necessary value exchange with users to collect large amounts of structured consumer-level data. 

Smarter use of first-party data

By implementing strong data management processes and RAG pipelines, companies in verticals such as CPG and automotive can sift through the huge amount of unstructured audience insights and creative performance reports they have accumulated over time. These insights can be used to create a new type of competitive advantage based on first-party data. 

No matter how powerful new AI models become, they will always be better when provided with relevant data. The sooner marketers realize the opportunity, the better.

Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Follow The Brandtech Group and AdExchanger on LinkedIn.

For more articles featuring Hugo Loriot, click here.

Must Read

Roku Revamps Its Home Screen To Appease Both Consumers And Advertisers

Roku unveiled its new home screen, which includes new features designed to further personalize the home screen experience for each viewer.

Why Critics Say Email-Based IDs Don’t Work For CTV

Email targeting in CTV has a credibility problem as buyers and sellers question whether one-to-one identity even fits a channel built for broader reach.

How ‘Wrapped’ Insights Become Audience Segments

How does Spotify translate quirky Wrapped labels, like “divorced dad hipster,” into ad audiences? And is AI-generated content safe for brands? Spotify’s Global Head of Ad Product Katie English weighs in.

Privacy! Commerce! Connected TV! Read all about it. Subscribe to AdExchanger Newsletters

Pirated Sports Streams Are Warping TV’s Most Important Ratings

Although tides of ad revenue flow based on the ratings of certain tentpole TV events, a new crop of scammers now operate illicit sports livestreaming rings, and there’s almost nothing broadcasters can do about it.

AI Is Redefining Premium Content – Which May Not Be A Good Thing

At AdExchanger’s Programmatic AI conference, media experts discussed how the rise of AI-generated content is changing the industry’s understanding of “premium” content.

The Big Story Podcast

Prog AI Live: AI’s Slippery Slop

Recorded live in Las Vegas at Prog AI, the AdExchanger team tackles a tricky question: As AI floods the feed with chaotic, addictive content and people engage with it, what does “premium” even mean anymore?