Home Publishers The IAB Tech Lab Isn’t Pulling Any Punches In The Fight Against AI Scraping

The IAB Tech Lab Isn’t Pulling Any Punches In The Fight Against AI Scraping

SHARE:

Digital publishers are in a losing battle against Big Tech and AI for traffic and ad revenue. But the IAB Tech Lab has a plan to swing the momentum back in pubs’ favor. On Tuesday, the Tech Lab announced a new publisher-focused working group that aims to ensure pubs are fairly compensated when AI scrapes their content for training.

IAB Tech Lab CEO Anthony Katsur previewed the new initiative at AdMonsters’ Sell Side Summit in Nashville, Tenn., on Monday. He also shed some light on the IAB and Tech Lab’s efforts to lobby federal and state lawmakers to prioritize copyright enforcement.

And he threw out plenty of red meat for Sell Side Summit’s publisher-heavy crowd.

“If you are being crawled, and you’re not being paid for your content, the Tech Lab’s position on that is that’s theft, full stop,” Katsur said to a round of applause.

The “CoMP” in “compensation”

The new Tech Lab initiative, dubbed the AI Content Monetization Protocols (CoMP) Working Group, will publish a technical framework geared toward compelling AI companies and LLM operators to give publishers fair value for their content.

In addition to protecting publisher content, the framework would also protect content from brand websites and marketing pages, Katsur said.

The framework will have three areas of focus. They include creating a reliable mechanism for publishers to block unwanted bot crawlers, helping users discover publisher content through the use of LLMs and creating APIs that allow publishers to opt into LLMs ingesting their data.

An API-based model would make it easier to ensure that publishers are compensated based on how often their content shows up in LLM queries, Katsur said.  The Tech Lab believes this method of compensation will be a more valuable revenue model for publishers than charging LLMs for data access. In that sense, the Tech Lab framework is a break from Cloudflare’s new pay to crawl model.

“We don’t believe cost per crawl scales,” Katsur said. “We think cost per user query scales.”

In addition, a logging API would provide accountability for making sure LLMs honor publishers’ terms for how their data is used, Katsur said.

Subscribe

AdExchanger Daily

Get our editors’ roundup delivered to your inbox every weekday.

The framework should eventually give publishers and brands ways to control how access to certain data is priced, he added.

“Your late-breaking interview with Taylor Swift would be valued very differently than archival content from 20 years ago,” he said. (It’s an example he uses often.)

To facilitate that access, the Tech Lab envisions the “tokenization of the content feed,” Katsur said. Basically, that means each piece of publisher content would be broken up into smaller parts that can be more easily ingested by LLMs. It would also be easier to track and provide accountability for how LLMs use these smaller, digestible bits of data, he said.

Big Tech support, or lack thereof

But, in order to make this framework viable, the Tech Lab realizes it needs buy-in from AI companies, Katsur said.

Thus far, he’s been encouraged by Google’s and Meta’s involvement in the Tech Lab’s working group. “When we had our LLM workshop in New York earlier this summer, they were in the room,” he said.

However, he said, OpenAI, Anthropic and Perplexity won’t even return the Tech Lab’s calls.

Although Google has been an important early partner for the Tech Lab’s initiative, Katsur didn’t completely spare them from criticism.

When Scott Messer, summit co-host and founder of sell-side consultancy Messer Media, asked if Google should use two different crawlers for indexing search results and LLM training, Katsur was unequivocal. “Google needs to split their crawler up,” he said.

Lobbying against LLMs

While support from AI companies is key for the Tech Lab’s initiative, government support for enforcing and enhancing copyright protections is just as essential, Katsur said. Both the Tech Lab and the IAB will be more active in lobbying the government on that front going forward, he said.

Those lobbying efforts will include trying to put some teeth behind enforcement of industry standards like robots.txt, which publishers use to govern which bots are authorized to crawl their sites. Katsur suggested heavy fines and penalties for companies that ignore “do not crawl” directives in robots.txt.

“There has to be policy change at Capitol Hill,” he said. “When Congress comes back in session, there are going to be several conversations around copyright law down in DC.”

In a conversation with AdExchanger after his presentation, Katsur added that both the Tech Lab and the IAB are increasingly prioritizing outreach to state-level governments. Most of the recent advancements in regulating the internet and AI have happened at the state level, rather than at the federal level.

Katsur also encouraged publishers with political connections – particularly news publishers – to advocate for their own interests. He suggested appealing to politicians’ awareness of the worsening online information environment.

“If you want quality journalism and ad-subsidized free news and information in Western democracies, this shit has to stop,” he said. “I would engage with your politicians for sure and let your voices be heard.”

And, he added, publishers should also continue to take the fight against unauthorized content scraping to the courts.

“For crawlers that do not obey do not crawl,” he said, “sue the shit out of them.”

Must Read

The Arena Group's Stephanie Mazzamaro (left) chats with ad tech consultant Addy Atienza at AdMonsters' Sell Side Summit Austin.

For Publishers, AI Gives Monetizable Data Insight But Takes Away Traffic

Traffic-starved publishers are hopeful that their long-undervalued audience data will fuel advertising’s automated future – if only they can finally wrest control of the industry narrative away from ad tech middlemen.

Q3: The Trade Desk Delivers On Financials, But Is Its Vision Fact Or Fantasy?

The Trade Desk posted solid Q3 results on Thursday, with $739 million in revenue, up 18% year over year. But the main narrative for TTD this year is less about the numbers and more about optics and competitive dynamics.

Comic: He Sees You When You're Streaming

IP Address Match Rates Are a Joke – And It’s No Laughing Matter

According to a new report, IP-to-email matches are accurate just 16% of the time on average, while IP-to-postal matches are accurate only 13% of the time. (Oof.)

Privacy! Commerce! Connected TV! Read all about it. Subscribe to AdExchanger Newsletters
Comic: Gamechanger (Google lost the DOJ's search antitrust case)

The DOJ And Google Sharpen Their Remedy Proposals As The Two Sides Prepare For Closing Arguments

The phrase “caution is key” has become a totem of the new age in US antitrust regulation. It was cited this week by both the DOJ and Google in support of opposing views on a possible divestiture of Google’s sell-side ad exchange.

create a network of points with nodes and connections, plain white background; use variations of green and grey for the dots and the connctions; 85% empty space

Alt Identity Provider ID5 Buys TrueData, Marking Its First-Ever Acquisition

ID5 bought TrueData mainly to tackle what ID5 CEO Mathieu Roche calls the “massive fragmentation” of digital identity, which is a problem on the user side and the provider side.

CTV Manufacturers Have A New Tool For Catching Spoofed Devices

The IAB Tech Lab’s new device attestation feature for its Open Measurement SDK provides a scaled way for original device manufacturers to confirm that ad impressions are associated with real devices.