Data: Deja Vu All Over Again?

Tom ChavezTom Chavez is an entrepreneur, technologist, musician, and family man residing in San Francisco.  He was the founder and CEO of Rapt Inc., and following the acquisition of Rapt by Microsoft, he served as General Manager of Microsoft Advertising’s Online Publisher Business Group.

I’d like to take a moment to respond to Tolman Geffs recent query, “Are Publishers &@%$#ed?” (PDF)

The short answer is:  Yes, unless they rise up and re-assert control of their data.

These days data seems to be on the lips of every player, principal, enabler, provider, intermediary, and hustler in digital media. There’s no question that the decoupling of media from data is a dislocation that creates opportunity or disaster for publishers.  Advertisers and agencies can now purchase tonnage inventory from a social media or portal provider for dimes and nickels and transform it into $5-$8 inventory by meshing it with their own or third-party data.   The net effects from the publisher perspective:  (1) downward pressure on media prices; (2) new revenue opportunity from data.

In my last company, which I think of as a Round 1 digital media concern, we were fortunate to work with a number of savvy publishers who successfully monetized premium guaranteed inventory at scale, kept their reliance on ad networks in check, and emerged with limbs intact from the worst economic downturn in decades.  It was tough, though, to watch poor channel/reseller management degrade eCPM’s across the industry.  This was particularly true  outside a narrow band of best-practice publishers who, through smart people and smart tools, kept the media middlemen in check.  If physical goods followed the same practices as digital media between 2000 and 2007, Intel would sell its newest, hottest chips for $100 to Dell and HP and simultaneously put them on barges to Asia for resale at $1.

We’re in Round 2 now, and from my perch, it would be a shame to watch publishers repeat with data the mistake made with media in Round 1: namely, middlemen who got too much for too little.

I think there’s a simple, counter-intuitive reason why that’s unlikely to happen, and it has little to do with technology buildout or market adoption.  I might get kicked under the table for saying it, but there’s a silver lining to FTC oversight and the possibility of congressional action.  Even if publishers were ready to start moving data full-throttle into reseller channels, networks, and exchanges, the current privacy environment wouldn’t allow it.

Before they can really attack data monetization, website operators need to solve more immediate questions in data leakage, governance, valuation, and control. They need to give consumers and regulators confidence that the data is being managed responsibly, and they need to understand and assert the data’s true value to the market.  In Tolman’s terms, publishers need to “manage and capture the value of the data they originate” as the precondition to all the possibilities that follow.

Or, as they say in Oakland, you’ve got to pay your dues before you change your shoes.

Step 1 = management.  Step 2 = monetization.

Moving beyond Soviet-era publisher technology

With the emergence of DSP’s, the buy-side of digital media has, almost overnight, armed itself with increasingly sophisticated tooling for segmentation, real-time bidding, and ROI analytics.  Publishers, meanwhile, are left with dated platforms architected to manage ads, not data.  Existing publisher systems simply don’t scale to accommodate the avalanche of audience data from sources such as

  • online interaction with content
  • user registration databases
  • profile and behavioral third-party sources
  • offline providers

and generated from devices such as:

  • PCs,
  • mobile handhelds,
  • tablets,
  • set top boxes, and
  • game devices.

No one wants to see that precious data scattered to the winds, regulators and lawmakers especially.  Capturing, tracking, and processing the torrent of publisher real-time data in an integrated system of record constitutes one of the hardest Big Data challenges in web computing today.  Incumbents seeking to attack it by retrofitting Soviet-era applications developed for Round 1 face certain despair.

First, their hard-working, resource-constrained development teams haven’t yet had time to learn or adopt the emerging technical standards.  Second, the problem sets are fundamentally different, and the velocity and scale at which they unfold require new enabling technologies.  Round 1 media management – forecasting available inventory, pricing ads through guaranteed or auction-based methods, serving ads, analyzing their performance – worked decently well on big iron servers and relational databases. But Round 2 questions – capturing, ingesting, taxonomizing, matching, moving, monetizing, connecting, and certifying data – introduce processing, storage, and algorithmic complexities thousands of times beyond what current ad-centered systems support.

Third, and most importantly, most incumbents are already operating networks and exchanges that treat the publisher as a supplier, not a customer or partner.  They can dangle shimmery objects in front of the publisher and foreswear their middleman’s ways, but make no mistake:  they’re in business to improve their own margins, not their suppliers’.

As publishers move to consolidate and retake control of their audience data, they need neutral partners free of incentive conflicts and uncluttered by salespeople peddling the publishers’ own media to advertisers.  They need technology that can support hyper-scale, real-time data collection and data processing across formats, sources, and devices.

To compete with the behemoths, they need systems architected from the ground up around cloud-scale technologies for distributed computation (Hadoop) and scale-out data management (e.g., Cassandra, Voldemort, Redis).  Only one, maybe two, incumbents can handle Round 2 scale – and they happen to be the players who raise the hair on the back of publishers’ necks the most.

After the Big Data foundation comes bona fide Data Management & Monetization

Scalable storage and computing is only half the solution; the other half is about transforming the publisher’s raw data exhaust into market-ready product, managing its inbound and outbound functions, and detecting when it’s leaked or stolen.  Tomorrow’s platform must deliver a flexible mechanism for collecting and integrating multiple sources of user data, regardless of source, format, or device.  Further, that platform must provide the facility to organize the primary saleable attributes within the dataset (e.g., demographic, psychographic, behavioral, intent) into actionable, portable form.

  • The inbound challenge is about ingesting, tracking, and integrating third-party data sources (e.g., purchased data from external online/offline sources, registration data from internal offline sources) into the publisher’s data store.
  • The outbound challenge is about audit trailing the data publishers provide to external buyers and partners and certifying its use to prevent abuse or theft.  If a publisher enters into a contract with a buyer to deliver access to a particular slice of data for a fixed period of time, at the time of expiry that publisher needs analytic verification that the buyer no longer has access to the data in question.
  • Even more urgent than verification is the question of data leakage: publishers need to detect when their data is being skimmed or stolen and measure the impact of data leakage on website and revenue performance.

Any publisher contemplating Tolman’s question shouldn’t sleep well at night without being able to tick off the following list of data platform ‘musts.’

  • A single, cloud-scale environment in which to harness, manage, and move audience data across formats, sources, and devices
  • The confidence to know that the data is being managed with the same technology that behemoths like Google and Amazon use to run their computing grids
  • Policy-managed control and optimization of inbound/outbound data flow
  • Regulation-proof consumer data infrastructure that prevents data from flowing into places where it shouldn’t and that delivers real-time targeting consistent with business policy, consumer preference, and regulatory expectations
  • A single version of a user’s profile, through a publisher-owned cookie, ensuring portability, secure access, protection, and efficient transfer and monetization of audience data internet-wide and across multiple devices

While I don’t think any one can claim to have all the answers yet, a problem well-framed is a problem half-solved. With the right data infrastructure in place, publishers can open up regulation-proof revenue streams worth hundreds of millions of dollars.  It would be a shame if some new channel master strip-mined their audience of its emerging data value without giving them their due.

Enjoying this content?

Sign up to be an AdExchanger Member today and get unlimited access to articles like this, plus proprietary data and research, conference discounts, on-demand access to event content, and more!

Join Today!


  1. Jason Kelly


    A phenomenal post that captures the state of the state around the data ecosystem and the importance for publishers to have a proactive strategy that covers off on “data leakeage, governance, valuation and control” as you say.

    The challenge remains that even with a well thought out strategy there are so few independent entities in the marketplace that publishers can turn to solving for the needs detailed in your data platform must list.

    In any case, it is great to see an increasing level of focus and conversation on the sell side and I am hopeful that this will result in additional capabilities and technology solutions for publishers enabling us to have a more active role as both inventory AND data management players within the ecosystem driving incremental revenue opportunities for the sell side.

    To extend your comment(s) a bit:

    1) If you can’t measure it you can’t manage it
    2) If you can’t manage it you can’t monetize it
    3) Those that can manage it (Networks, Exchanges, DSPs, Etc.) will monetize it for and around you leaving you behind

    Thanks again for a great post Tom.


  2. Bravo! This is a Manifesto for Pubs and it’s so spot on I’m getting suspicious that you stole our business plan. 😉

  3. Well reasoned and well written – great job. No doubt that publishers will need sophisticated tools to manage their data and prevent leakage. To me the interesting question is whether or not they’ll be able to effectively monetize it separately from their premium context. I guess we’ll find out…

  4. Tom, thanks for a well thought through article.

    In my view, the challenge for publishers is that they dont see the entire spectrum of user interest/activity on the web; and therefore cant target very effectively.

    Dont get me wrong: Publishers have a very valuable nugget of insight about the user; i.e. the user interest demonstrated by his/her interaction on their properties.

    Publishers could look at Data Aggregators as a resource. Source audience intelligence to understand a particular user on your site when they arrive, and share (or sink) data back when you have a meaningful insight that is not proprietary.

    As a contributor of insights, Publishers could assert stringent controls and requirements for how that data is shared with others, and look to Data Aggregators as a revenue source.

    In the offline world, companies like Acxiom and Experian pay large sums to gain access to information from retailers. The power balance in the online world is lopsided at the moment.

    In the broader context, Publishers need all the audience intelligence they can get – from the data aggregators, their own site analytics, and other sources that have foundation data about geo, demo, social, etc.

    the next 12 months will be very exciting in this space

    • Search has proven you *do not* need the entire spectrum of user interest/activity on the web to target effectively. In fact the broad spectrum has poor signal/noise ratio as the new data players are bearing witness to now.

      • Jonathan, challenge on that comment….

        Search prices are rising and are becoming more and more competitive and as you dig into media plans you realize that you wish you could track to the impression level and the more competition increases in search the more the need develops to target effectively in search.

        It’s becoming easier and easier to over-spend in search and harder and harder to separate the wheat from the chaff. So I think using search as a pillar of reason to support your valid comment is a progressively slippery slope.

      • Brian,

        I was not referring to the pricing component of Search but the fact that targeting/delivery of relevance that’s accomplished in that channel — a channel devoid of cookie based rules/matching and with a common interface and creative component for everyone using the channel — produces the highest degrees of non PII relevance and matching this medium has to offer.

  5. Mike Blacker

    Tom – well said! Our industry continues to overlook the fact that the tier-1 publishers still control most of the inventory and most of the data. I think the stat is 90% of all inventory and revenue still resides with the top 20 pubs? How many tier-1 publishers share data into the marketplace? Very few…if any. How many tier-1 publishers still share inventory into the marketplace? very few.

    It seems all the excitement about the exchange/RTB/DSP/data/network space is really about splitting hairs. I think that is why everyone was so alarmed by Terence kawaja’s now famous Savvian slide. This industry tends to be excited by shiny objects and forgets that the tier-1 publishers still control most of the pie.

    Not to mention, the entire data landscape is predicated on cookies. Not only do they get deleted 30-40% of the time, but the government is starting to take a keen interest and will likely protect consumers in some way (TBD). BT still only accounts for less than 3% of overall spend online, yet it seems that we spend 99% of our time focused on it. BT data is great for advertisers that rely on intent (travel, auto, etc.) but the “behavioral ceiling” doesn’t help P&G sell toothpaste or Unilever double their online spend to promote Lipton Ice Tea.

    Once this industry has the ability to accurately and effectively deliver demographics (not inferred based on BT data) and geo-location (not based on an IP address) the tides will turn. Overall I think the tier-1 publishers are in a great position to weather this storm and regain control of the marketplace.

    • Tom Chavez


      I agree with your observation that the Top 20 pubs still account for the lion’s share of revenue, data, and impressions. You could well be right that the age of shimmery objects and middlemen with reverse-double-twist business models is coming to an end. I agree, it’s hard to see how all the players in Terence’s slide, especially the guys in the middle, carry on.

      The specific worry I have — and the reason why things might not be as shiny for the Top 20 as you suggest — is that so many of the players in Terence’s slide are focused on strip-mining the Top 20 of valuable audience data. By unbundling the publisher’s value proposition – audience data + content + environment – and leaving the publisher with just content + environment, those middlemen erode net price for the publisher’s media and reduce his ability to generate content that attracts valuable audiences in the first place.

      Too many publishers could be feeding the cow and turning the other way while strangers come in and milk her for free. If that continues, we’ll have to redo the count and start referring to the Top 200 instead of the Top 20.