Home Publishers The IAB Tech Lab Isn’t Pulling Any Punches In The Fight Against AI Scraping

The IAB Tech Lab Isn’t Pulling Any Punches In The Fight Against AI Scraping

SHARE:

Digital publishers are in a losing battle against Big Tech and AI for traffic and ad revenue. But the IAB Tech Lab has a plan to swing the momentum back in pubs’ favor. On Tuesday, the Tech Lab announced a new publisher-focused working group that aims to ensure pubs are fairly compensated when AI scrapes their content for training.

IAB Tech Lab CEO Anthony Katsur previewed the new initiative at AdMonsters’ Sell Side Summit in Nashville, Tenn., on Monday. He also shed some light on the IAB and Tech Lab’s efforts to lobby federal and state lawmakers to prioritize copyright enforcement.

And he threw out plenty of red meat for Sell Side Summit’s publisher-heavy crowd.

“If you are being crawled, and you’re not being paid for your content, the Tech Lab’s position on that is that’s theft, full stop,” Katsur said to a round of applause.

The “CoMP” in “compensation”

The new Tech Lab initiative, dubbed the AI Content Monetization Protocols (CoMP) Working Group, will publish a technical framework geared toward compelling AI companies and LLM operators to give publishers fair value for their content.

In addition to protecting publisher content, the framework would also protect content from brand websites and marketing pages, Katsur said.

The framework will have three areas of focus. They include creating a reliable mechanism for publishers to block unwanted bot crawlers, helping users discover publisher content through the use of LLMs and creating APIs that allow publishers to opt into LLMs ingesting their data.

An API-based model would make it easier to ensure that publishers are compensated based on how often their content shows up in LLM queries, Katsur said.  The Tech Lab believes this method of compensation will be a more valuable revenue model for publishers than charging LLMs for data access. In that sense, the Tech Lab framework is a break from Cloudflare’s new pay to crawl model.

“We don’t believe cost per crawl scales,” Katsur said. “We think cost per user query scales.”

In addition, a logging API would provide accountability for making sure LLMs honor publishers’ terms for how their data is used, Katsur said.

The framework should eventually give publishers and brands ways to control how access to certain data is priced, he added.

“Your late-breaking interview with Taylor Swift would be valued very differently than archival content from 20 years ago,” he said. (It’s an example he uses often.)

To facilitate that access, the Tech Lab envisions the “tokenization of the content feed,” Katsur said. Basically, that means each piece of publisher content would be broken up into smaller parts that can be more easily ingested by LLMs. It would also be easier to track and provide accountability for how LLMs use these smaller, digestible bits of data, he said.

Big Tech support, or lack thereof

But, in order to make this framework viable, the Tech Lab realizes it needs buy-in from AI companies, Katsur said.

Thus far, he’s been encouraged by Google’s and Meta’s involvement in the Tech Lab’s working group. “When we had our LLM workshop in New York earlier this summer, they were in the room,” he said.

However, he said, OpenAI, Anthropic and Perplexity won’t even return the Tech Lab’s calls.

Although Google has been an important early partner for the Tech Lab’s initiative, Katsur didn’t completely spare them from criticism.

When Scott Messer, summit co-host and founder of sell-side consultancy Messer Media, asked if Google should use two different crawlers for indexing search results and LLM training, Katsur was unequivocal. “Google needs to split their crawler up,” he said.

Lobbying against LLMs

While support from AI companies is key for the Tech Lab’s initiative, government support for enforcing and enhancing copyright protections is just as essential, Katsur said. Both the Tech Lab and the IAB will be more active in lobbying the government on that front going forward, he said.

Those lobbying efforts will include trying to put some teeth behind enforcement of industry standards like robots.txt, which publishers use to govern which bots are authorized to crawl their sites. Katsur suggested heavy fines and penalties for companies that ignore “do not crawl” directives in robots.txt.

“There has to be policy change at Capitol Hill,” he said. “When Congress comes back in session, there are going to be several conversations around copyright law down in DC.”

In a conversation with AdExchanger after his presentation, Katsur added that both the Tech Lab and the IAB are increasingly prioritizing outreach to state-level governments. Most of the recent advancements in regulating the internet and AI have happened at the state level, rather than at the federal level.

Katsur also encouraged publishers with political connections – particularly news publishers – to advocate for their own interests. He suggested appealing to politicians’ awareness of the worsening online information environment.

“If you want quality journalism and ad-subsidized free news and information in Western democracies, this shit has to stop,” he said. “I would engage with your politicians for sure and let your voices be heard.”

And, he added, publishers should also continue to take the fight against unauthorized content scraping to the courts.

“For crawlers that do not obey do not crawl,” he said, “sue the shit out of them.”

Must Read

ChatGPT Ads Have Begun Showing Up For Logged-Out Users

Good news for advertisers, many of whom have found it difficult to meet minimum spend budgets on ChatGPT: Logged-out users can now see ads.

Amazon Faces An Easy Boycott But An Existential Question

The Amazon advertising boycott last week wasn’t really about Amazon’s ad platform as much as it was a dispute over evolving seller economics, which raises a fundamental question: Can you even build a brand on Amazon anymore?

Unity And Index Exchange Unite Behind Gaming Data In Non-Gaming Channels

For the first time, Unity’s gaming audiences will be available for ad targeting outside the Unity platform, with Index Exchange using Unity’s data to curate web and CTV inventory.

Privacy! Commerce! Connected TV! Read all about it. Subscribe to AdExchanger Newsletters

Brand-Trained Agents Can Give Marketers A Fuller View Of Their Customers

Agentic commerce company Envive builds on-site agents for brands like footwear company Clove, painting a clearer picture of what their customers are looking for.

Don’t Worry About Netflix – It’s Doing Fine Without Warner Bros. Discovery

Paramount might have outlasted and outbid Netflix in the competition to acquire Warner Bros. Discovery, but Netflix is not overly fussed about the loss.

Paramount’s Upfront Pitch Is About Three Things

Paramount is merging the ad tech stacks behind Paramount+ and Pluto TV, releasing a new performance product, offering more control over ad placements and introducing dynamic ad insertion in live sports.