Data: Deja Vu All Over Again?

Tom Chavez is an entrepreneur, technologist, musician, and family man residing in San Francisco. He was the founder and CEO of Rapt Inc., and following the acquisition of Rapt by Microsoft, he served as General Manager of Microsoft Advertising’s Online Publisher Business Group.

I’d like to take a moment to respond to Tolman Geffs recent query, “Are Publishers &@%$#ed?” (PDF)

The short answer is: Yes, unless they rise up and re-assert control of their data.

These days data seems to be on the lips of every player, principal, enabler, provider, intermediary, and hustler in digital media. There’s no question that the decoupling of media from data is a dislocation that creates opportunity or disaster for publishers. Advertisers and agencies can now purchase tonnage inventory from a social media or portal provider for dimes and nickels and transform it into $5-$8 inventory by meshing it with their own or third-party data. The net effects from the publisher perspective: (1) downward pressure on media prices; (2) new revenue opportunity from data.

In my last company, which I think of as a Round 1 digital media concern, we were fortunate to work with a number of savvy publishers who successfully monetized premium guaranteed inventory at scale, kept their reliance on ad networks in check, and emerged with limbs intact from the worst economic downturn in decades. It was tough, though, to watch poor channel/reseller management degrade eCPM’s across the industry. This was particularly true outside a narrow band of best-practice publishers who, through smart people and smart tools, kept the media middlemen in check. If physical goods followed the same practices as digital media between 2000 and 2007, Intel would sell its newest, hottest chips for $100 to Dell and HP and simultaneously put them on barges to Asia for resale at $1.

We’re in Round 2 now, and from my perch, it would be a shame to watch publishers repeat with data the mistake made with media in Round 1: namely, middlemen who got too much for too little.

I think there’s a simple, counter-intuitive reason why that’s unlikely to happen, and it has little to do with technology buildout or market adoption. I might get kicked under the table for saying it, but there’s a silver lining to FTC oversight and the possibility of congressional action. Even if publishers were ready to start moving data full-throttle into reseller channels, networks, and exchanges, the current privacy environment wouldn’t allow it.

Before they can really attack data monetization, website operators need to solve more immediate questions in data leakage, governance, valuation, and control. They need to give consumers and regulators confidence that the data is being managed responsibly, and they need to understand and assert the data’s true value to the market. In Tolman’s terms, publishers need to “manage and capture the value of the data they originate” as the precondition to all the possibilities that follow.

Or, as they say in Oakland, you’ve got to pay your dues before you change your shoes.

Step 1 = management. Step 2 = monetization.

Moving beyond Soviet-era publisher technology

With the emergence of DSP’s, the buy-side of digital media has, almost overnight, armed itself with increasingly sophisticated tooling for segmentation, real-time bidding, and ROI analytics. Publishers, meanwhile, are left with dated platforms architected to manage ads, not data. Existing publisher systems simply don’t scale to accommodate the avalanche of audience data from sources such as

online interaction with content
user registration databases
profile and behavioral third-party sources
offline providers

and generated from devices such as:

PCs,
mobile handhelds,
tablets,
set top boxes, and
game devices.

No one wants to see that precious data scattered to the winds, regulators and lawmakers especially. Capturing, tracking, and processing the torrent of publisher real-time data in an integrated system of record constitutes one of the hardest Big Data challenges in web computing today. Incumbents seeking to attack it by retrofitting Soviet-era applications developed for Round 1 face certain despair.

First, their hard-working, resource-constrained development teams haven’t yet had time to learn or adopt the emerging technical standards. Second, the problem sets are fundamentally different, and the velocity and scale at which they unfold require new enabling technologies. Round 1 media management – forecasting available inventory, pricing ads through guaranteed or auction-based methods, serving ads, analyzing their performance – worked decently well on big iron servers and relational databases. But Round 2 questions – capturing, ingesting, taxonomizing, matching, moving, monetizing, connecting, and certifying data – introduce processing, storage, and algorithmic complexities thousands of times beyond what current ad-centered systems support.

Third, and most importantly, most incumbents are already operating networks and exchanges that treat the publisher as a supplier, not a customer or partner. They can dangle shimmery objects in front of the publisher and foreswear their middleman’s ways, but make no mistake: they’re in business to improve their own margins, not their suppliers’.

As publishers move to consolidate and retake control of their audience data, they need neutral partners free of incentive conflicts and uncluttered by salespeople peddling the publishers’ own media to advertisers. They need technology that can support hyper-scale, real-time data collection and data processing across formats, sources, and devices.

To compete with the behemoths, they need systems architected from the ground up around cloud-scale technologies for distributed computation (Hadoop) and scale-out data management (e.g., Cassandra, Voldemort, Redis). Only one, maybe two, incumbents can handle Round 2 scale – and they happen to be the players who raise the hair on the back of publishers’ necks the most.

After the Big Data foundation comes bona fide Data Management & Monetization

Scalable storage and computing is only half the solution; the other half is about transforming the publisher’s raw data exhaust into market-ready product, managing its inbound and outbound functions, and detecting when it’s leaked or stolen. Tomorrow’s platform must deliver a flexible mechanism for collecting and integrating multiple sources of user data, regardless of source, format, or device. Further, that platform must provide the facility to organize the primary saleable attributes within the dataset (e.g., demographic, psychographic, behavioral, intent) into actionable, portable form.

The inbound challenge is about ingesting, tracking, and integrating third-party data sources (e.g., purchased data from external online/offline sources, registration data from internal offline sources) into the publisher’s data store.
The outbound challenge is about audit trailing the data publishers provide to external buyers and partners and certifying its use to prevent abuse or theft. If a publisher enters into a contract with a buyer to deliver access to a particular slice of data for a fixed period of time, at the time of expiry that publisher needs analytic verification that the buyer no longer has access to the data in question.
Even more urgent than verification is the question of data leakage: publishers need to detect when their data is being skimmed or stolen and measure the impact of data leakage on website and revenue performance.

Any publisher contemplating Tolman’s question shouldn’t sleep well at night without being able to tick off the following list of data platform ‘musts.’

A single, cloud-scale environment in which to harness, manage, and move audience data across formats, sources, and devices
The confidence to know that the data is being managed with the same technology that behemoths like Google and Amazon use to run their computing grids
Policy-managed control and optimization of inbound/outbound data flow
Regulation-proof consumer data infrastructure that prevents data from flowing into places where it shouldn’t and that delivers real-time targeting consistent with business policy, consumer preference, and regulatory expectations
A single version of a user’s profile, through a publisher-owned cookie, ensuring portability, secure access, protection, and efficient transfer and monetization of audience data internet-wide and across multiple devices

While I don’t think any one can claim to have all the answers yet, a problem well-framed is a problem half-solved. With the right data infrastructure in place, publishers can open up regulation-proof revenue streams worth hundreds of millions of dollars. It would be a shame if some new channel master strip-mined their audience of its emerging data value without giving them their due.

Tagged in: