Home AI How Eyeo Worked With Students To Innovate On AI

How Eyeo Worked With Students To Innovate On AI

SHARE:
ad blocker

Eyeo is putting the “learning” in machine learning.

Last year, it partnered with a university student initiative in Munich to have students find new approaches to using AI in its online ad filtering.

As the parent company of AdBlock and Adblock Plus, two of the most downloaded ad-blocking services on the market, eyeo has invested a lot of time and money into developing its own deep-learning methods for detecting and filtering ads.

But eyeo had yet to explore how generative modeling could be used to curate and maintain ad filtering lists, which is typically a tedious, error-prone, largely manual process that is difficult to scale.

Eyeo’s extensions rely heavily on filter lists, such as EasyList, that are maintained by a community of mostly unpaid volunteers and contain tens of thousands of rules for blocking and hiding certain network requests and the HTML code that’s responsible for ad rendering.

The ability to automatically update these lists with accuracy would be a major benefit, said Dr. Humera Noor Minhas, director of engineering at eyeo.

Automation station

Last year, TUM.ai, a student initiative within the Technical University of Munich, put out a call for proposals for its Moonshot competition, a hackathon-style project for students interested in pursuing a career in AI.

In response, eyeo submitted a proposal to have TUM.ai students tackle the challenge of developing an AI model that automatically classifies URLs and generates filter rules based on website content.

Eyeo’s proposal stood out due to “its real-world impact,” according to TUM.ai student advisor Thomas Wölkhart.

Subscribe

AdExchanger Daily

Get our editors’ roundup delivered to your inbox every weekday.

It also gave students the opportunity to acquire hands-on technical expertise and get direct feedback from eyeo engineers, he said.

See it to believe it

The six-week challenge ran from March to May 2023 and demonstrated how using generative AI could make ad filtering less onerous.

For instance, one student used OpenAI’s APIs to tweak the GPT-3 Ada model, producing an effective model that created automated webpage-based filter rules.

The student’s successful model opened eyeo’s eyes to generative AI’s real-world applications.

What students lack in experience, they often make up for in curiosity and fresh perspective.

But the challenge’s biggest boon for eyeo wasn’t actually the model itself, according to Minhas. It was the data set eyeo created for students to work with during the challenge, which contained more than 1 million ads.

To generate the data set for the challenge, eyeo first looked at open-source lists, like Alexa and Similarweb, to find the most-visited domains in different regions, according to Minhas. Then it gathered information from the HTML of those sites, including the headings, organic content and, crucially for eyeo’s purposes, the ads on the page.

Challenge participants worked with two data sets to develop their AI models: a training set and a test set. The training set included 100,000 pairs of webpages and corresponding filter rules, while the test set held 36,000 webpages.

By dividing data into two sets, students could check how well their models generalized data they hadn’t seen before, Wölkhart said, and the test set allowed eyeo to judge how closely the student-produced models matched the filters from the training set.

But the process also simulated how companies test model performance before rolling out a model to users – an important step, Wölkhart said, since “one wrong filter rule could break a whole website for thousands of users.”

Eyes ahead

Following the Moonshot project, eyeo went on to use the data set it generated for the challenge to train another model involving URL parameters.

Eyeo tested using URL parameters to detect if a certain portion of a page has ads or not, Minhas said.

URL parameters are query strings that append additional information to a basic web address to pass to a server, track ad campaigns and customize user experiences on a website.

This new URL-based classifier model achieved precision – a machine learning performance metric that measures a model’s accuracy – comparable to the solutions eyeo already has in production.

The company is working on a proof of concept and has identified use cases for the new model, such as automatically detecting buggy ad filters. “It has generalization ability that allows us to detect ads in unknown domains and provide a better user experience in ad filtering,” Minhas said.

Next up, eyeo has been researching how to detect ads served in AI chatbot experiences and filter them out if a user doesn’t want to see them anymore.

“The data set was a huge step for us to take our research forward,” Minhas said.

Must Read

clickbait

Perion Shutters Content IQ, Its Made-For-Advertising Division

Laptop fans can rest a little easier. A network of well-known MFA sites operated by Perion-owned Content IQ have been taken offline.

‘Incrementality’ Is The Buzzword That Stole Prog IO

Well, that’s a wrap on Programmatic IO Las Vegas 2024! The AdExchanger editorial hopped on stage for a live recording of The Big Story to round up all the moments that made us go “a-ha” this week, including observations on commerce media, CTV and generative AI.

Paramount And Shopsense Add Programmatic Demand To Their Shoppable Ad Network

What if the new storefront is a person sitting on their couch and scrolling their phone?

Privacy! Commerce! Connected TV! Read all about it. Subscribe to AdExchanger Newsletters

Scott’s Miracle-Gro Is Seeing Green With Retail Media

It’s lawn season – and you know what that means. Scott’s Miracle-Gro commercials, of course. Except this time, spots for Scott’s will be brought to you by The Home Depot’s retail media network.

Walled Garden Platforms Are Drowning Marketers In Self-Attributed Sales

Sales are way up; ROAS is through the roof across search, social and ecommerce. At least, that’s what the ad platforms say.

Comic: Working Hard or Hardly Working?

Shadier Than Forbes? Premium Publishers Are Partnering With Content Farms To Make A Quick Programmatic Buck

The practice involves monetizing resold subdomains jammed with recycled MFA articles produced by notorious content farms.