Fraud-day With Telemetry: “Automating Ad Fraud Detection Is Dangerous”

Ad-serving and verification company Telemetry isn’t in favor of automating the ad fraud detection process.

The problem, said Geo Carncross, the company’s global VP of engineering, is that detection sensors can be fooled into thinking fraudulent impressions are real and, consequently, advertisers will start optimizing for fraud instead of for real ad performance.

Telemetry’s fraud solution is part of its overall managed service.

“We don’t expressly detect ad fraud,” Carncross said. “We identify fraudulent vehicles. The distinction is important because it’s a labor-driven thing. We build tools and technology to make the analyst more able to interact with the vehicle in real time. Once we’ve identified the vehicle, we can do things similar to what other vendors are doing by isolating and separating.”

Carncross and his colleague, Telemetry head of infrastructure Alex Clouter, spoke with AdExchanger.

AdExchanger: How do you define fraud? How do you differentiate fraud from bad media?

GEO CARNCROSS: It’s not a classification. There’s nothing about the impression that makes it fraud. It’s whether the whole ad experience is what was sold to the advertiser. It’s fraud because the advertiser believes it’s fraud when they see what’s going on, when they see the whole vehicle.

ALEX CLOUTER: It’s not what they paid for.

What’s a vehicle?

GC: It might be composed of a chain of individual actors in ad tech that are cooperating directly or their software is being used to manipulate. This combination of events, of technologies, of systems, together, is the vehicle.

AC: Some of these vehicles take rather low-quality spots on legit sites, maybe some of the sites the advertisers want to put them on, then they polished them up to make them look like premium spots and sold them on. The sites themselves were not bad. It’s just not what you wanted to buy.

GC: That arbitrage is becoming common. You have real users with real cookies. You have comScore or Nielsen evaluate them, they’re going to pass the test. In that situation, to identify what a vehicle is doing, we’ll go onto exchanges ourselves and try to identify the other bidder and figure out who is paying and buying.

What’s the next step after you isolate the event?

GC: We tell our customer what we found. This traffic is not what they think it is. What they’ll have us do right after that is tell all of their publishers and provide them with a certain amount of assistance in identifying it and making good on it.

Sometimes they decide this vendor I’m buying from can’t clean it up, so I won’t deal with them anymore.

How does that work?

GC: It’s a creative process. You’ll want to look for a cluster of outliers with which to start looking at what the [fraud] vehicle will look like. A site that’s really popular but doesn’t have search traffic or link traffic, that doesn’t have any real content, might be difficult to believe it’s generating a significant amount of traffic. This is where it starts.

Telemetry looks at the site, looks at the traffic and doesn’t believe it. That’s where the investigation starts.

So is there a constant monitoring system or does something have to occur before it comes to your attention?

GC: There’s constant monitoring. People will try to invent new ways of gaming sensors. One thing Alex’s team made possible was a situation where somebody was directly manipulating the code of other ad-verification companies.

They put something into the ad tag to destructively modify the sensors to tell Integral or Adap.tv that it’s always visible. It was initially flagged to the vendor for something else, back in April, because the traffic didn’t seem like…

AC: It was the audibility and they refused to believe they could be wrong. We found they were going in and manipulating their sensors to say it’s audible.

How do you manipulate somebody else’s sensor?

GC: That part is actually pretty easy. It doesn’t require a great deal of programming expertise. It’s rudimentary. For Flash-based stuff there might be decompilation. But a lot of these sensors are just JavaScript tags. You just download somebody’s code and modify it. Change all the words “hidden” to “visible” and make it run. That seems a little unnatural so they’ll put some fuzzer in it: It’s visible 90% or 95%, so it still looks amazing, but it’s not perfect.

The hard part is figuring out how to detect that remotely. Verification companies use software to tell them their ads are good, but the software is being lied to.

So what does one do about that?

GC: That’s why I say it’s a labor-driven thing. It’s an effort-driven thing. Our code is probably a lot more sophisticated than the sites it’s on. It’s not easy to manipulate. If you hammer at it, a lot of stuff you’re trying to do just won’t work.

We have caught people trying to manipulate our code. One of the sensors comes back to us to see whether the ad is visible or not, people try to manipulate that. We don’t actually use that for anything. We have a different test. But if [it’s manipulated], we flag that because we know someone is tampering with our stuff.

The other part is it’s interactive. The analyst actually interacts with the code in real time. Someone goes to the website, the ad starts and for that 15-30 seconds it’s running, the analyst basically has the ability to look around see what’s going on.

Going forward, can you automate this process?

GC: No, it’ll never be automated.

The longer answer is that automating it is kind of dangerous. Let’s say you’re looking at outlier activity, which is a good start. Let’s say you use that to classify fraudulent activity. The industry response will be: “I need more reach to make up for that, because my media budget remains the same.” So because legitimate publishers have a hard time competing with that, maybe their quality goes down. But those that don’t, there’s more money out there for more fraud.

When we find a vehicle that’s running, it ends up being 50-60% of a media buy for a short period of time for a week or two. At that point, the legitimate traffic is the outlier. That’s going to exacerbate the situation.

Because it’s so labor-intensive, is the solution scalable?

AC: We use the tools to help amplify and speed up the process. The more traditional approach is, you get some ideas about this, you develop a few sensors, you roll them out and wait a few days for some results to come back. It’s like fumbling around in the dark. If you’re waiting three days to resolve, you lose the momentum.

Although it’s labor-intensive, there are ways to scale that.

How do you scale that process?

GC: We’ve spent the last five or six years building this interactive system, working with the ad player environment to do it live so we can continuously iterate and roll out. What Alex referred to – that latency where you keep trying something and waiting, trying something and waiting – the campaign is over before you get any results. If you can remove the waiting, it goes quick. The tricky part is: How do you do that without potentially breaking people’s ads? How do you do that on a global network?

How do you do that on a global network without breaking people’s ads?

GC: Some of our best tools work like search engines. You type a query in and it will stream out sessions that are going on now with no lag. One to two seconds, you can then use another tool to say: “Here are the parameters I’m looking for.” These run in seconds.

Who do you work with?

GC: Telemetry is brand-direct. We don’t work specifically for the agencies. At the highest possible level we have to give them policy advice. If we go to the vendor first, the MRC’s own guidelines say we simply filter it moving forward. That’s not sustainable, in my opinion.

There’s always going to be more fraudsters out there. We try to work with a lot of vendors, whether they be ad networks or even publishers. Their big struggle is not being able to utilize the information we give.

Why is that?

GC: We record everything our sensors store, like basically forever. We can always go back and break it down in different ways. The networks might not store this because it’s inconvenient. It’s a lot of data. We crew an enormous amount of data every data. I think right now we’re at one and a half terabytes of log files every day that’s compressed. It’s an unwieldy amount of data and you have to make the data important to your company in order to spend the type of energy on it.

So that gives you a larger perspective?

AC: It’s historical definitely. When we try to collate our logs with partners or publishers or networks, there’s quite often a lack of historical data. You might get one or two days back. But if you ask for a week or a fortnight back, it’s not there. They’ve just got some aggregate numbers related to billing points floating around, which makes the analysis really quite difficult, and that can become quite accusational, where people call us liars.

What happens when vendors protest?

GC: We get handed in to speak to them quite early on because they want to work together. It’s in everybody’s best interest to work together. When Alex says they don’t believe us, it’s skepticism based on the fact they want to confirm it. But there’s a data mismatch. What they’re able to confirm is accounts by day and maybe part of the domain name, and only for a few days.

How far back do you typically look?

GC: We generally look back to the beginning of the campaign.

So when you look all the way back, and they can’t go all the way back, what’s the reaction?

GC: Our processes surrounding this has been reviewed as part of our MRC accreditation. We try to come up something they can test for. In one situation, we’re able to identify the majority of the fraud was coming from five or six IP blocks, which was something a lot of vendors could act on. They weren’t aware they could because nobody had ever tried.

Anything vendors do to prevent fraud?

GC: There are two things they could probably do. If every new publishers that comes along that wants to bring them a site, they should look at it. And then ease the amount of traffic it generates. I can’t think of a single situation where we’ve run into a site or vehicle that was believable. Like if you go to the site, it doesn’t have content, or it’s all nonsense. Or a huge cooking site in America that has only the same three Indian food recipes over and over again, it’s not real. It doesn’t hold up.

How do these sites work? A dummy site with bots firing off?

GC: Sometimes it’s bots. Or in the arbitrage situation Alex referred to earlier, these sites will be wiki sites or sites with only a few banner units floating around the bottom. You’re like: “Where’s the video ad?”

So saying they’re showing a video ad, when they’re only showing a banner ad.

GC: It’s just a stand-in, a dummy for what the real traffic is. I look at it and I don’t believe people go there. They’ve got a whole page of Facebook likes, but everything has one or two likes. Or a thousand likes. And it always remains the same, even if the traffic goes up and up and up. Human beings can intuitively make statements about this sort of thing.

This site came to me with 3,000 likes? It should have more next month if the traffic doubles. Why doesn’t it?

It seems like you could just build an algorithm to stop this, but you mentioned some sites are actively fooling those sensors?

GC: Yeah.

Is that the newest development?

GC: No, we saw sensor modification the first time we ran in China a few years ago. The traffic we saw there, they were modifying our ad server so they could deliver more of them. They weren’t getting all of them. A lot of them are timing based. We check to make sure the quartiles come at the right time, and those are harder to fool. It’s more like a canary that signifies something is wrong and it gets the attention. That was back at the end of 2010.

Anything new over the last year?

GC: A lot of the interesting ones are the huge increase in video. I think the fact is, because the price difference between video and display is so big, there’s a natural arbitrage opportunity there. Before, people would do display-to-click. You’d see a lot of click fraud five or 10 years ago. They’d buy a display unit, sell a click unit and the user would go to a page and get spammed with fake clicks. So they passed the user test, the cookie test, but now the brand gets pissed off because they’re not being represented very well.

Now you’ve got a situation where video is going from $6-$8, $10 on real-time bidding buys. And you can pick up enough display to fill it, easily, for 10 or 20 cents. The only way you can make that work is by lying to the sensors, so it seems like a natural consequence.

Where is this process coming from? From the ad networks?

One particular brand has in their insertion orders that ad units won’t be 300×250. We all know what they mean by that. They say they want video units but not 300×250. Put it in the IO.

But then there’s a really big ad network that, in their medium size – they have small, medium and large. And their small size is 300×250. Their medium size is 351×251. Now, a human being would look at a 351×251 and say: “It’s just one pixel. That’s not medium.”

But on that particular network, I’d say 40-50% of the medium size ads are in that size. So you meet the exact letter and law of the IO, but unless someone looks at it, it gets through.

So what does the client do, since the network adheres to the terms of the contract.

GC: The network says they’re being gamed like everyone else. That network is publisher-supplied. They don’t have much information on it. They only want to spend two bits of storage to log the size of the ad. They didn’t want to store the actual width and height and do complicated byte matching. It was technologically easier for them to do this.

Now they’re in a situation where they just don’t know. And we don’t know until the unit is bought and we run our code.

How do you get a makegood?

GC: The advertiser says, “I won’t ever buy from you again.” It’s kind of one-sided I guess. But I suppose everyone quickly agrees 351×251 isn’t real. And the advertiser is so hungry for reach, they keep going back anyway.

Do you think there’ll be a standard around that?

GC: How do you make moral part of the standard? How do you put “No funny business” in the IO? I’d love to see someone try that. It’d be hysterical.

Could you put a range?

GC: One feature is we give brands a way to index different parts of their media, including ad size, qualitatively. To detach them from rates and ratios and view it as: This vendor is less gaming than this other vendor. Then we can weight really heavily that 351×251, so it’s disastrous to that part of the indexing.

What about an industry ranking?

GC: A lot of the vendors didn’t want to do that. Right now, it might be time for someone to try that again. Maybe fraud is the right angle to do that under. Amount of fraud on network. Actually, that’d be terrible. That can’t possibly be good news.

Would you need vendor buy-in to do that? Could it happen on the advertisers’ side?

GC: The advertisers will pay money for that kind of research, but don’t want to share it with other advertisers who won’t pay for it. Being able to package the product as guidance and policy level advice and as insulation for this kind of problem works commercially, even though it’s not efficient.

AC: It’s referring back to how labor intensive any kind of investigation is. It’s another reason why having the advertiser pay for it or push these investigation is going to be awkward.

GC: It’s gotta be the publishers finding a way to do it and they’d have to do it simultaneously. Nobody will do it first.

Tagged in: