The IAB Tech Lab Isn’t Pulling Any Punches In The Fight Against AI Scraping

Digital publishers are in a losing battle against Big Tech and AI for traffic and ad revenue. But the IAB Tech Lab has a plan to swing the momentum back in pubs’ favor. On Tuesday, the Tech Lab announced a new publisher-focused working group that aims to ensure pubs are fairly compensated when AI scrapes their content for training.

IAB Tech Lab CEO Anthony Katsur previewed the new initiative at AdMonsters’ Sell Side Summit in Nashville, Tenn., on Monday. He also shed some light on the IAB and Tech Lab’s efforts to lobby federal and state lawmakers to prioritize copyright enforcement.

And he threw out plenty of red meat for Sell Side Summit’s publisher-heavy crowd.

“If you are being crawled, and you’re not being paid for your content, the Tech Lab’s position on that is that’s theft, full stop,” Katsur said to a round of applause.

The “CoMP” in “compensation”

The new Tech Lab initiative, dubbed the AI Content Monetization Protocols (CoMP) Working Group, will publish a technical framework geared toward compelling AI companies and LLM operators to give publishers fair value for their content.

In addition to protecting publisher content, the framework would also protect content from brand websites and marketing pages, Katsur said.

The framework will have three areas of focus. They include creating a reliable mechanism for publishers to block unwanted bot crawlers, helping users discover publisher content through the use of LLMs and creating APIs that allow publishers to opt into LLMs ingesting their data.

An API-based model would make it easier to ensure that publishers are compensated based on how often their content shows up in LLM queries, Katsur said. The Tech Lab believes this method of compensation will be a more valuable revenue model for publishers than charging LLMs for data access. In that sense, the Tech Lab framework is a break from Cloudflare’s new pay to crawl model.

“We don’t believe cost per crawl scales,” Katsur said. “We think cost per user query scales.”

In addition, a logging API would provide accountability for making sure LLMs honor publishers’ terms for how their data is used, Katsur said.

AdExchanger Daily

Get our editors’ roundup delivered to your inbox every weekday.

Daily Roundup

Daily News Roundup

Daily Mail Launches A Social Media Agency; AI Eats The Food Blogs’ Lunch

The framework should eventually give publishers and brands ways to control how access to certain data is priced, he added.

“Your late-breaking interview with Taylor Swift would be valued very differently than archival content from 20 years ago,” he said. (It’s an example he uses often.)

To facilitate that access, the Tech Lab envisions the “tokenization of the content feed,” Katsur said. Basically, that means each piece of publisher content would be broken up into smaller parts that can be more easily ingested by LLMs. It would also be easier to track and provide accountability for how LLMs use these smaller, digestible bits of data, he said.

Big Tech support, or lack thereof

But, in order to make this framework viable, the Tech Lab realizes it needs buy-in from AI companies, Katsur said.

Thus far, he’s been encouraged by Google’s and Meta’s involvement in the Tech Lab’s working group. “When we had our LLM workshop in New York earlier this summer, they were in the room,” he said.

However, he said, OpenAI, Anthropic and Perplexity won’t even return the Tech Lab’s calls.

Although Google has been an important early partner for the Tech Lab’s initiative, Katsur didn’t completely spare them from criticism.

When Scott Messer, summit co-host and founder of sell-side consultancy Messer Media, asked if Google should use two different crawlers for indexing search results and LLM training, Katsur was unequivocal. “Google needs to split their crawler up,” he said.

Lobbying against LLMs

While support from AI companies is key for the Tech Lab’s initiative, government support for enforcing and enhancing copyright protections is just as essential, Katsur said. Both the Tech Lab and the IAB will be more active in lobbying the government on that front going forward, he said.

Those lobbying efforts will include trying to put some teeth behind enforcement of industry standards like robots.txt, which publishers use to govern which bots are authorized to crawl their sites. Katsur suggested heavy fines and penalties for companies that ignore “do not crawl” directives in robots.txt.

“There has to be policy change at Capitol Hill,” he said. “When Congress comes back in session, there are going to be several conversations around copyright law down in DC.”

In a conversation with AdExchanger after his presentation, Katsur added that both the Tech Lab and the IAB are increasingly prioritizing outreach to state-level governments. Most of the recent advancements in regulating the internet and AI have happened at the state level, rather than at the federal level.

Katsur also encouraged publishers with political connections – particularly news publishers – to advocate for their own interests. He suggested appealing to politicians’ awareness of the worsening online information environment.

“If you want quality journalism and ad-subsidized free news and information in Western democracies, this shit has to stop,” he said. “I would engage with your politicians for sure and let your voices be heard.”

And, he added, publishers should also continue to take the fight against unauthorized content scraping to the courts.

“For crawlers that do not obey do not crawl,” he said, “sue the shit out of them.”