Home Publishers The IAB Tech Lab Isn’t Pulling Any Punches In The Fight Against AI Scraping

The IAB Tech Lab Isn’t Pulling Any Punches In The Fight Against AI Scraping

SHARE:

Digital publishers are in a losing battle against Big Tech and AI for traffic and ad revenue. But the IAB Tech Lab has a plan to swing the momentum back in pubs’ favor. On Tuesday, the Tech Lab announced a new publisher-focused working group that aims to ensure pubs are fairly compensated when AI scrapes their content for training.

IAB Tech Lab CEO Anthony Katsur previewed the new initiative at AdMonsters’ Sell Side Summit in Nashville, Tenn., on Monday. He also shed some light on the IAB and Tech Lab’s efforts to lobby federal and state lawmakers to prioritize copyright enforcement.

And he threw out plenty of red meat for Sell Side Summit’s publisher-heavy crowd.

“If you are being crawled, and you’re not being paid for your content, the Tech Lab’s position on that is that’s theft, full stop,” Katsur said to a round of applause.

The “CoMP” in “compensation”

The new Tech Lab initiative, dubbed the AI Content Monetization Protocols (CoMP) Working Group, will publish a technical framework geared toward compelling AI companies and LLM operators to give publishers fair value for their content.

In addition to protecting publisher content, the framework would also protect content from brand websites and marketing pages, Katsur said.

The framework will have three areas of focus. They include creating a reliable mechanism for publishers to block unwanted bot crawlers, helping users discover publisher content through the use of LLMs and creating APIs that allow publishers to opt into LLMs ingesting their data.

An API-based model would make it easier to ensure that publishers are compensated based on how often their content shows up in LLM queries, Katsur said.  The Tech Lab believes this method of compensation will be a more valuable revenue model for publishers than charging LLMs for data access. In that sense, the Tech Lab framework is a break from Cloudflare’s new pay to crawl model.

“We don’t believe cost per crawl scales,” Katsur said. “We think cost per user query scales.”

In addition, a logging API would provide accountability for making sure LLMs honor publishers’ terms for how their data is used, Katsur said.

The framework should eventually give publishers and brands ways to control how access to certain data is priced, he added.

“Your late-breaking interview with Taylor Swift would be valued very differently than archival content from 20 years ago,” he said. (It’s an example he uses often.)

To facilitate that access, the Tech Lab envisions the “tokenization of the content feed,” Katsur said. Basically, that means each piece of publisher content would be broken up into smaller parts that can be more easily ingested by LLMs. It would also be easier to track and provide accountability for how LLMs use these smaller, digestible bits of data, he said.

Big Tech support, or lack thereof

But, in order to make this framework viable, the Tech Lab realizes it needs buy-in from AI companies, Katsur said.

Thus far, he’s been encouraged by Google’s and Meta’s involvement in the Tech Lab’s working group. “When we had our LLM workshop in New York earlier this summer, they were in the room,” he said.

However, he said, OpenAI, Anthropic and Perplexity won’t even return the Tech Lab’s calls.

Although Google has been an important early partner for the Tech Lab’s initiative, Katsur didn’t completely spare them from criticism.

When Scott Messer, summit co-host and founder of sell-side consultancy Messer Media, asked if Google should use two different crawlers for indexing search results and LLM training, Katsur was unequivocal. “Google needs to split their crawler up,” he said.

Lobbying against LLMs

While support from AI companies is key for the Tech Lab’s initiative, government support for enforcing and enhancing copyright protections is just as essential, Katsur said. Both the Tech Lab and the IAB will be more active in lobbying the government on that front going forward, he said.

Those lobbying efforts will include trying to put some teeth behind enforcement of industry standards like robots.txt, which publishers use to govern which bots are authorized to crawl their sites. Katsur suggested heavy fines and penalties for companies that ignore “do not crawl” directives in robots.txt.

“There has to be policy change at Capitol Hill,” he said. “When Congress comes back in session, there are going to be several conversations around copyright law down in DC.”

In a conversation with AdExchanger after his presentation, Katsur added that both the Tech Lab and the IAB are increasingly prioritizing outreach to state-level governments. Most of the recent advancements in regulating the internet and AI have happened at the state level, rather than at the federal level.

Katsur also encouraged publishers with political connections – particularly news publishers – to advocate for their own interests. He suggested appealing to politicians’ awareness of the worsening online information environment.

“If you want quality journalism and ad-subsidized free news and information in Western democracies, this shit has to stop,” he said. “I would engage with your politicians for sure and let your voices be heard.”

And, he added, publishers should also continue to take the fight against unauthorized content scraping to the courts.

“For crawlers that do not obey do not crawl,” he said, “sue the shit out of them.”

Must Read

Why Media Mergers And Spin-Offs Don’t Always Keep Their Promises

With media megamergers, acquisitions and spin-offs left and right, the media landscape is changing at a pace that is difficult to keep up with.

TransUnion is partnering with Blockgraph so that advertisers can use its identity data to target, reach and measure TV households across channels.

How This Disaster Relief Nonprofit Tapped First-Party Data To Reach Donors Year-Round

Staying top of mind for potential donors is an ongoing challenge for Direct Relief. Nexxen’s audience curation helped it spread and sustain awareness.

Why Major UK Publishers Are Finally Joining Forces To Curate Ad Inventory

Atria’s collective approach is a response to growing monetization challenges and the need to protect the value of human journalism in the AI era.

Privacy! Commerce! Connected TV! Read all about it. Subscribe to AdExchanger Newsletters
Toronto Canada pride parade includes a crowd waving pride flags

Ad Performance And Politics Steered Brand Dollars Away From LGBTQ+ Communities – But The Pendulum Will Swing Back

The current administration has discouraged many marketers and organizations from showing support for the LGBTQ+ community, including during Pride month.

How AI Can Enhance Content Without Generating It

As much as consumers complain about AI-generated content, advertising experts say AI still has an important place in video creation and production, including for ads. But using AI in content without turning off consumers is a tricky dance.

How Tovala Banks On Subscriptions And Incrementality – But Not Ads – To Profit From Its Oven

Smart TVs, refrigerators and other home appliances may pester you with marketing, but at least the hardware is cheap. Another startup taking a different approach to the same theory is Tovala, which was founded in 2015 and combines a standalone countertop oven with a weekly meal kit subscription.