Home Publishers The IAB Tech Lab Isn’t Pulling Any Punches In The Fight Against AI Scraping

The IAB Tech Lab Isn’t Pulling Any Punches In The Fight Against AI Scraping

SHARE:

Digital publishers are in a losing battle against Big Tech and AI for traffic and ad revenue. But the IAB Tech Lab has a plan to swing the momentum back in pubs’ favor. On Tuesday, the Tech Lab announced a new publisher-focused working group that aims to ensure pubs are fairly compensated when AI scrapes their content for training.

IAB Tech Lab CEO Anthony Katsur previewed the new initiative at AdMonsters’ Sell Side Summit in Nashville, Tenn., on Monday. He also shed some light on the IAB and Tech Lab’s efforts to lobby federal and state lawmakers to prioritize copyright enforcement.

And he threw out plenty of red meat for Sell Side Summit’s publisher-heavy crowd.

“If you are being crawled, and you’re not being paid for your content, the Tech Lab’s position on that is that’s theft, full stop,” Katsur said to a round of applause.

The “CoMP” in “compensation”

The new Tech Lab initiative, dubbed the AI Content Monetization Protocols (CoMP) Working Group, will publish a technical framework geared toward compelling AI companies and LLM operators to give publishers fair value for their content.

In addition to protecting publisher content, the framework would also protect content from brand websites and marketing pages, Katsur said.

The framework will have three areas of focus. They include creating a reliable mechanism for publishers to block unwanted bot crawlers, helping users discover publisher content through the use of LLMs and creating APIs that allow publishers to opt into LLMs ingesting their data.

An API-based model would make it easier to ensure that publishers are compensated based on how often their content shows up in LLM queries, Katsur said.  The Tech Lab believes this method of compensation will be a more valuable revenue model for publishers than charging LLMs for data access. In that sense, the Tech Lab framework is a break from Cloudflare’s new pay to crawl model.

“We don’t believe cost per crawl scales,” Katsur said. “We think cost per user query scales.”

In addition, a logging API would provide accountability for making sure LLMs honor publishers’ terms for how their data is used, Katsur said.

Subscribe

AdExchanger Daily

Get our editors’ roundup delivered to your inbox every weekday.

The framework should eventually give publishers and brands ways to control how access to certain data is priced, he added.

“Your late-breaking interview with Taylor Swift would be valued very differently than archival content from 20 years ago,” he said. (It’s an example he uses often.)

To facilitate that access, the Tech Lab envisions the “tokenization of the content feed,” Katsur said. Basically, that means each piece of publisher content would be broken up into smaller parts that can be more easily ingested by LLMs. It would also be easier to track and provide accountability for how LLMs use these smaller, digestible bits of data, he said.

Big Tech support, or lack thereof

But, in order to make this framework viable, the Tech Lab realizes it needs buy-in from AI companies, Katsur said.

Thus far, he’s been encouraged by Google’s and Meta’s involvement in the Tech Lab’s working group. “When we had our LLM workshop in New York earlier this summer, they were in the room,” he said.

However, he said, OpenAI, Anthropic and Perplexity won’t even return the Tech Lab’s calls.

Although Google has been an important early partner for the Tech Lab’s initiative, Katsur didn’t completely spare them from criticism.

When Scott Messer, summit co-host and founder of sell-side consultancy Messer Media, asked if Google should use two different crawlers for indexing search results and LLM training, Katsur was unequivocal. “Google needs to split their crawler up,” he said.

Lobbying against LLMs

While support from AI companies is key for the Tech Lab’s initiative, government support for enforcing and enhancing copyright protections is just as essential, Katsur said. Both the Tech Lab and the IAB will be more active in lobbying the government on that front going forward, he said.

Those lobbying efforts will include trying to put some teeth behind enforcement of industry standards like robots.txt, which publishers use to govern which bots are authorized to crawl their sites. Katsur suggested heavy fines and penalties for companies that ignore “do not crawl” directives in robots.txt.

“There has to be policy change at Capitol Hill,” he said. “When Congress comes back in session, there are going to be several conversations around copyright law down in DC.”

In a conversation with AdExchanger after his presentation, Katsur added that both the Tech Lab and the IAB are increasingly prioritizing outreach to state-level governments. Most of the recent advancements in regulating the internet and AI have happened at the state level, rather than at the federal level.

Katsur also encouraged publishers with political connections – particularly news publishers – to advocate for their own interests. He suggested appealing to politicians’ awareness of the worsening online information environment.

“If you want quality journalism and ad-subsidized free news and information in Western democracies, this shit has to stop,” he said. “I would engage with your politicians for sure and let your voices be heard.”

And, he added, publishers should also continue to take the fight against unauthorized content scraping to the courts.

“For crawlers that do not obey do not crawl,” he said, “sue the shit out of them.”

Must Read

How AudienceMix Is Mixing Up The Data Sales Business

AudienceMix, a new curation startup, aims to make it more cost effective to mix and match different audience segments using only the data brands need to execute their campaigns.

Broadsign Acquires Place Exchange As The DOOH Category Hits Its Stride

On Tuesday, digital out-of-home (DOOH) ad tech startup Place Exchange was acquired by Broadsign, another out-of-home SSP.

Meta’s Ad Platform Is Going Haywire In Time For The Holidays (Again)

For the uninitiated, “Glitchmas” is our name for what’s become an annual tradition when, from between roughly late October through November, Meta’s ad platform just seems to go bonkers.

Privacy! Commerce! Connected TV! Read all about it. Subscribe to AdExchanger Newsletters
Monopoly Man looks on at the DOJ vs. Google ad tech antitrust trial (comic).

Closing Arguments Are Done In The US v. Google Ad Tech Case

The publisher-focused DOJ v. Google ad tech antitrust trial is finished. A judge will now decide the fate of Google’s sell-side ad tech business.

Wall Street Wants To Know What The Programmatic Drama Is About

Competitive tensions and ad tech drama have flared all year. And this drama has rippled out into the investor circle, as evident from a slew of recent ad tech company earnings reports.

Comic: Always Be Paddling

Omnicom Allegedly Pivoted A Chunk Of Its Q3 Spend From The Trade Desk To Amazon

Two sources at ad tech platforms that observe programmatic bidding patterns said they’ve seen Omnicom agencies shifting spend from The Trade Desk to Amazon DSP in Q3. The Trade Desk denies any such shift.