A simmering resentment among digital publishers has finally boiled over.
They’re fed up with brand safety and verification vendors using crawlers to scrape their sites for contextual signals, then using those signals to sell contextual ad products.
The contextual data scraping problem was detailed in an open letter published last week by the UK’s Association of Online Publishers (AOP), written by its managing director, Richard Reeves.
The AOP hopes Reeves’ letter will spur wider discussion of the improper use of publisher IP and prompt the industry to collaborate on a fairer path forward for publishers. Reeves calls for more industry collaboration to create equitable licensing agreements between publishers and verification vendors.
“[Publishers] entirely support the need to verify campaigns,” Reeves told AdExchanger. But he said publishers should also have discretion over whether verification vendors can repackage the data for advertisers and agencies.
Contextual commercialization
Publishers allow verification vendors to operate crawlers on their sites because buyers won’t purchase unverified inventory.
But publishers say using crawlers to gather contextual signals for ad products goes beyond the scope of their agreements. They’re not compensated for this unlicensed use of their intellectual property. And the only way to prevent verification vendors from scraping sites is to not allow any search crawlers, which would seriously diminish search traffic.
The data-scraping problem has gotten worse since brand safety providers began rolling out their own contextual targeting products.
Integral Ad Science (IAS) released its Context Control product in 2020, which is based on language-parsing technology it acquired from ADmantX in 2019. About half of IAS’ 2022 programmatic revenue came from the offering, according to its Q4 2022 earnings.
DoubleVerify introduced a similar product, called Custom Contextual Solution, also in 2020.
Neither IAS nor DoubleVerify responded to AdExchanger’s request for comment.
The AOP tried to discuss Context Control with IAS, Reeves said, but IAS claimed the product couldn’t be uncoupled from its brand safety crawler. IAS told the AOP that publishers’ only recourse is to block IAS’ crawlers, but warned that could hurt their revenue.
This kind of reaction speaks to an emerging dynamic of brand safety providers directly competing with publishers for advertiser dollars rather than simply collecting a service-layer fee, said Justin Wohl, CRO at Salon.
“These companies have replaced the value exchange for agencies to come straight to publishers,” Wohl said. And brand safety companies have been able to build a level of trust from years of providing verification services, whereas some buyers are still mistrustful of contextual audiences built by publishers.
Now, publishers are seeing their CPMs for open-market programmatic down 40% year over year, Wohl continued. “And the brand safety businesses are not seeing the same.”
Indeed, IAS’ programmatic revenues grew 30% year over year, according to its Q4 earnings.
Calls to action
But, despite questions over where programmatic revenue is flowing, the AOP’s letter isn’t just about publishers looking to collect a revenue share from brand safety vendors for contextual ad products, Reeves emphasized.
Rather, it’s about publishers’ right to defend their choice of partners and control over their first-party data.
The AOP letter calls on the buy side to protect publisher interests by only buying contextual segments from vendors with publisher licensing agreements that authorize monetizing the IP – as opposed to unsanctioned scraping.
But buyers have little transparency into publisher vendor licensing agreements, said Deva Bronson, EVP and global head of brand assurance at Dentsu. Rather, ad buyers rely on certification partners like the Media Rating Council (MRC) and the Trustworthy Accountability Group (TAG) to determine if vendors are operating within the scope of their agreements.
Prior to releasing the open letter, the AOP worked with TAG on its latest update to its Brand Safety Certification Guidelines (v2.11) to add language distinguishing between legitimate and illegitimate data usage by brand safety vendors.
The new guidelines (found in clause 4.5 of TAG’s certification) state that sellers will not lose their Brand Safety Certification if they restrict technologies from being deployed on their sites that gather data for illegitimate purposes.
Unfortunately, the content scraping and commercialization issues described in the AOP’s letter aren’t unique to brand safety vendors.
“Every intermediary who’s operating the supply chain – like Criteo and Google Search Appliance – scrapes a publisher’s website and indexes or packages its content for consumption by buyers or consumers,” said Scott Cunningham, a founding member of TAG, chair of the Brand Safety Institute’s publisher council and lead for the Local Media Consortium’s NewsPassID and NewsNext initiatives.
And tech intermediaries have been using publisher data stripped from the RTB bid stream for years.
An open letter published by BPA Worldwide in 2020 called attention to the issue of bid stream data leakage in a similar manner to the AOP’s letter about data scraping, said Havona Madama, Bombora’s chief data privacy officer and general counsel. BPA’s letter even suggested many of the same solutions put forth by the AOP’s letter, such as increased education, more collaboration, and clearer contractual language and certification requirements. And yet, the issue still persists three years later.
The nuclear option
If the industry does not collaborate on a solution, Reeves’s letter warned that publishers could take “more radical, disruptive” action.
That could include pursuing legal action against offending vendors. To that end, the AOP is closely monitoring Getty Images’ pending case against Stability AI, which could set a precedent for protecting IP from data-scraping bots.
But any legal action would take years to work its way through the court system, and publishers have little appetite for protracted legal battles they’re likely to lose.
Besides, in the US, LinkedIn’s lawsuit against hiQ, in which LinkedIn attempted to stop hiQ from scraping publicly available user profiles, failed to set a precedent that would be favorable to publishers. The Ninth Circuit’s decision said preventing such data scraping would be anticompetitive. That means scraping publicly available content is currently not prohibited, Madama said.
All sources interviewed for this story agreed that a legislative fix is unlikely anytime soon.
However, some European regulatory bodies have expressed interest in publisher complaints about unlicensed use of IP, according to Reeves. He has discussed the issue with the UK’s Competition and Markets Authority (CMA) and Digital Markets Unit (DMU). And the UK’s Information Commissioner’s Office (ICO) has deferred the AOP’s complaint to an internal special investigations unit.
Absent any legislative or legal recourse, publishers could band together to institute a “go dark day” in which they collectively turn off permissions for third-party crawling and indexing on their sites, Cunningham said. But doing so would send the wrong message, he stressed, and a collaborative approach would be much more likely to inspire change without sacrificing publisher revenues.
Besides, such aggressive actions might be premature.
Publishers have just started to take the first steps toward establishing what they see as appropriate guidance for site-scraping and contextual vendors, Cunningham said. “It’s up to the industry to rally around that language and see whether we can affect the next level of dialogue and contracts.”