Publishers And The Hidden Costs Of Data Leakage

Tom ChavezTom Chavez is an entrepreneur, technologist, musician, and family man residing in San Francisco.  He was the founder and CEO of Rapt Inc., and following the acquisition of Rapt by Microsoft, he served as General Manager of Microsoft Advertising’s Online Publisher Business Group.

I have received an overwhelming amount of feedback from my last missive.  I heard from friends and colleagues in the publisher space, DSPs, agencies, and technology and service providers.   There were hugs, high fives, bricks, and a few rotten tomatoes – all of it instructive.  Ultimately I’m just glad to see some really smart people thinking about the challenges publishers face in an increasingly buyer-centric industry.

In my last note, I asserted that:

With the right data infrastructure in place, publishers can open up regulation-proof revenue streams worth hundreds of millions of dollars.  It would be a shame if some new channel master strip-mined their audience of its emerging data value without giving them their due.

Before publishers can start claiming value from data, they need to get a handle on what they’re losing.  Revenue Exposure is a metric that I have been exploring with my friend and colleague Vivek Vaidya, formerly CTO at Rapt, and Andy Skrzypacz, Professor of Economics at Stanford.  Our goal in this exercise is about more than intellectual stimulation; we’re dead-set on measuring the potential costs to digital media publishers of data collection across their websites.

At the core of Revenue Exposure is the notion of ‘missed’ demand.  By that I mean the downstream impact of others collecting data about a publisher’s users today, repackaging that data, combining it with tonnage media, and then using it to sell against that publisher in the future.  In essence, new sellers of like-kind assets emerge who shave points off the share-of-spend captured by the publisher, all of it energized by the publisher’s own data.

Of course, industry practitioners already know that this isn’t a conceptual scenario.  It is the hinge point to ad network arbitrage economics, oxygen for DSPs, and critical fuel for audience buying and remarketing/re-targeting for advertisers across the board.

I do not question the targeting tactic itself; it’s exactly the right way to leverage data to improve campaign performance and customize advertising, content, and commerce experiences for the end user.  What concerns me and a growing number of folks within the digital media community is that much of this activity is not occurring in the plain light of day.  Of course that raises serious privacy questions, and it’s increasingly urgent that our industry address them.  But at least as importantly, the data owners – usually the publishers, and perhaps the end consumer – may not be getting fairly compensated for the use of that data downstream.

Insurers use sophisticated actuarial measures to manage risk and adjust policy-setting and selling practices.  High-tech companies carefully measure book-to-bill and channel sell-through prices to ensure the long-term value of their chips and devices. It’s time for publishers to bring similar discipline to their own businesses.

Revenue Exposure is, in essence, an indicator of the publisher’s opportunity cost from data leakage or unauthorized data collection.  A high-tech company might reduce price of a component through a reseller channel to spur more short-term revenue, but that introduces the long-term risk of accelerated price erosion.  Similarly, a publisher can allow more data collection on his website today, but with potentially serious downstream consequences.  To understand them, we need to first briefly review the economics of the publishing business.

Publishers have three assets.  The first is content, whether they create it, acquire it from others, or invite their audience to contribute it.  Ultimately, they create, collect, curate, and present content that is of interest to a particular audience.  Second, they have environment, representing the essential and unique qualities of the manner in which the content is presented.  Consider the perceived differences between a glossy fashion magazine and a supermarket tabloid.  They might both run a cover story about the same celebrity, but the quality of presentation and the inherent value imparted by the publications’ brands are different.  These differences shape the relationships those two publishers develop with both their readers and their advertisers.

Finally, a publisher has the all-important asset, audience.  Given the quality and usefulness of the information on offer (content) and the manner in which that content is presented (environment), a publisher aims to grow a large and loyal audience.  When an advertiser purchases ad space from the publisher, it seeks to reach that audience with a particular message.  Further, the advertiser may even be willing to pay a premium for placements based on the quality of the environment.  The publication’s brand, history, and production quality all lend confidence to the advertiser that its message will not only reach the desired audience, but will also be presented to the audience in a pleasing way.

This is the fundamental economic model underpinning all ad-supported publishing, online and offline.  In exchange for useful content and a satisfying experience, users consume advertising.  To reach the right audience, advertisers pay for ads, thereby underwriting the publisher’s operating costs required to create, collect, curate, and present content.

Good Data, Cheap Media

Technological advancements and new business models disrupt that fundamental economic balance, jeopardizing the equitable value exchange between publisher, audience, and advertiser.  Through the use of cookies and tracking pixels, ad networks, DSPs, and individual advertisers can now identify and track specific users across the internet.  Often, they use those tracking techniques to gain ‘backdoor’ access to valuable user information without the publisher’s knowledge or authorization.

Whenever a third or fourth party backdoors a pixel onto a publisher’s site, they can collect data about that site’s audience.  That data is used to build valuable audience segments by combing it with cheaper media from other websites, bypassing the original publisher altogether.  This “Good Data + Cheap Media” strategy poses at least two immediate economic problems for the publisher.

First, it degrades the value of the publisher’s media, as the third party collecting the data effectively strips it of its audience value.  By unbundling the publisher’s value proposition – audience + content + environment – and leaving the publisher with just content + environment, the third party effectively degrades net price for the publisher’s media.  By way of example, if consumers can get an equally good cup of coffee at a nearby fast food joint as at Starbucks for less than half the price, their willingness to pay a premium for Starbucks is eroded.  If coffee were like digital media, the fast food joint would also be pilfering Starbuck’s beans to make its half-price coffee.

Second, the data can be used many times over to deliver an audience of interest to advertisers.  Problem is, it’s acquired for free.  The publisher creates the content that attracted the audience in the first place, but they receive no compensation from the third party that monetizes it over and over again with every impression, every offer, every click.  In the realm of music, when a songwriter writes a song, he or she earns royalties every time that song is performed.  If songwriting were like digital publishing, the songwriter would write a great song and watch someone else perform it in stadiums, on radio, and on TV, over and over again, without earning a penny from the effort. 

Publishers who leak data leak money. The indirect loss may appear small on a pixel-by-pixel, cookie-by-cookie basis, but it accumulates into considerable sums as the scale and scope of third party data collection grows.  Revenue Exposure puts the numbers to the “Good Data + Cheap Media” dynamic illustrated above and ties it back to specific data collection activities across the channels and sections of a publisher’s website.

Digital media publishers need always-on, real-time visibility regarding the revenue risk resulting from data leakage on their sites. Ultimately, it’s up to them to determine the level of risk they want to carry in their own operations as a function of the data collection they allow or prevent.  Subject to whatever privacy regime emerges in the months ahead, publishers should, in my view, have the ability to make these decisions on their own.  But they’d be ill advised to make them in the absence of a more quantitative baseline to help measure and manage the resulting exposure.

Enjoying this content?

Sign up to be an AdExchanger Member today and get unlimited access to articles like this, plus proprietary data and research, conference discounts, on-demand access to event content, and more!

Join Today!


  1. I think Tom has hit the nail on the head. What I believe he is referring to is a need for a “sell-side” platform. One that could potentially make “ad-supported” work for everyone.

    Content owners and creators originally sacrificed their audience value in exchange for traffic, an early ad metric. They commoditized their user data, giving it to third party networks, without extracting its true value. This drove down ad revenue as audience data was utilized and monetized on myriads of additional sites rather than the original content owners’.

    Many sites are now backpedaling, erecting pay-walls in order to recognize higher revenue, but it is unreasonable to expect a user such as myself to pay for a multitude of subscriptions.

    A compromise must be met.

    Three parties each have a need; users demand content, creators demand payment, marketers demand audience. I believe that markets can help by efficiently negotiating that compromise.

    Tom is correct. Publishers of all types have leaked massive amounts of money. It has always dumbfounded me that within the Internet (and mobile) era of ultra-addressability, digital ad revenue has continually traded at a discount. Perhaps with the creation of a “sell-side” platform, one that can extract the full value of an audience, publishers can plug the leak.

    Gulp Media

  2. Tom,

    My company (Brilig) is an advertising data marketplace that exists to enable publishers to directly control and monetize audience data (i.e. attributes and segments). My partner, the inventor of Brilig, Christopher Keith, was CTO of the NYSE for more than a decade. He foresaw that a transparent marketplace for data and the prediction of consumer attention and interest was inevitable so we built one. By transparent we mean that all parties (publisher, marketers and eventually even consumers) have full visibility and control of their data assets. Our unique approach is not to “give” data to buyers but rather to allow the buyers to ask (targeting) questions of complex sets of data (which are completely controlled by the pub/seller) thus preventing leakage. This approach enables all parties to maximize liquidity and profitability in a large dynamic market. I’d like to talk to you more about this if you are interested. I think we can help you with your study of the new economics of digital media. You can reach me at

  3. Tom, I think you have hit on an important topic here, although I believe that “leakage” is actually fairly far down on priorities of a publisher. The first step for a publisher is simply understanding their own audience! Today, most pubs don’t really understand their online audiences. Most of them grew up in the offline world where understanding audience was a critical part of doing business. This got lost in their shift to online, and it has cost them dearly as technology driven marketers/platforms have sucked the value from them. In order to effectively participate in this new world of data, bidding and advertiser access to audience, I believe the pubs need to reclaim their media and audience and only THEN begin to dole it out with full visibility into what they’re doing. This will give them a huge advantage over the current way of doing things and give them a seat at the table in garnering the true value of an impression.

    • Tom Chavez

      Thanks for the thoughtful comment, Russell.

      I agree that, for a publisher, understanding audience is absolutely key. It’s troublesome that so many middlemen have, arguably, done a better job of that these last 5-10 years than many publishers. I wonder, though: what’s the point of understanding an audience if you don’t own and control it? As I’m suggesting in related postings, I think Job 1 for publishers is to own their audiences before somebody else gets them for free. Understanding that you had 4M users with income > $500K after someone else skims them might satisfy curiosity, but it won’t improve your bottom line.

  4. Sabotosh

    Tom great job articulating the publisher value proposition now that third parties can easily extract and re-use / re-sell audience data.

    Not going to shill; but if publishers are not working with adnets that actually pay for the use of data – it’s time to start asking questions.

    • Tom Chavez


      What adnets out there are paying for data right now?

      • While we are much more than just an adnet, we (AudienceScience) pay for our data. We also help Publishers know and monetize their audience.

  5. great post. i completely agree that the leakage of data is a critical issue (and may even be in contravention of many privacy laws) it speaks to two elements we have been playing with as publishers.

    1. yield management may be leading us the wrong way. in traditional media, scarcity created the value (i.e. rate card rates as hurdles and limited ad spots). in online our abundance of inventory has taken scarcity away. each publisher’s job is to define what his “premium” offering is and sell that without the use of any 3rd parties to maximise its value.
    2. the packaging of segments is a critical tool for publishers, and so data is a critical issue, but to make the cost of sale reasonable, you need automation. a SSP would be a great thing (rather than an exchange). (as an aside, arent exchanegs traditionally managed on scarcity? i.e. very few companies with lots of repoting overheads mean that there are a limited number of places to invest your money thus share prices increase?) i completely get that in the future publishers will be selling segments of their audience at a higher yield per audience member than general audience. the question of how do we get there is still unanswered