Publishers, Stop Saying You’re ‘Sitting On A Pile Of Data’

The Sell Sider” is a column written by the sell side of the digital media community.

Today’s column is written by Jeremy Hlavacek, head of global automated monetization at IBM Watson Advertising. Jeremy will speak at AdExchanger’s PROGRAMMATIC I/O conference on April 10-11 at the Marriott Marquis in San Francisco.

If you’ve watched a publisher panel in the programmatic, media or ad tech space over the last few years, I would be willing to bet that the phrase “sitting on a pile of data” was uttered at least once. It usually goes something like this:

Digital publisher: “We think we can be successful in the modern digital media market because we are sitting on a huge pile of very valuable data.”

The intention of this phrase is to suggest that the publisher has some kind of secret weapon or inborn advantage that will allow it to come out on top in today’s highly competitive ad sales market. It also gives the speaker a false sense of accomplishment and is somehow supposed to strike fear in the hearts of buyers and other competitive pubs.

There is a problem with this expression, however. It is completely meaningless.

Starting today, I am challenging the publishing industry to eliminate this statement from the conference lexicon. It is nails on a chalkboard to my panel-scarred ears.

Why do I have such a strong reaction to this seemingly innocent phrase? Let me explain by breaking the sentence down in to three parts and showing why each piece reeks of bad strategy and a dangerous false confidence.

‘Sitting on’

What does the phrase “sitting on” imply? In my mind, it means “doing nothing.” A quick Google search reveals the Collins English Dictionary explanation: “If you say that someone is sitting on something, you mean that they are delaying dealing with it.”

I most often hear this phrase used in a derogatory way, such as, “The project is delayed because we are waiting for XYZ to make a decision, but they are sitting on it.”

Sadly, I think this definition is close to the truth for many publishers. They may have some useful data and a few ideas for how it could be valuable, but they are indeed sitting on it – that is, not making decisions about how to apply this asset to their business. It could be valuable for on-property campaigns or as an independent asset decoupled from media.

Or maybe both? Maybe there are operational insights in the data that could inform sales strategy? Or improve campaign setups and yield?

The intention of this sentence is to somehow suggest an advantage, and yet the phrase itself is about doing nothing. Unfortunately, I believe this is very much the reality for most publishers.

‘A pile of’

To me, these three words are usually used in phrases like “a pile of clothes on the floor.” Or “a pile of spare parts in a scrap yard.”

The phrase suggests a collection of random things that may or may not be similar. A pile is not organized in any particular way and usually thrown together in a rush with little thought of what should go where and why. Some antonyms here could be “curated set” or “organized collection.”

This is a very accurate portrayal of most publishers’ data sets. Data is scattered about in various systems without a lot of strategy of what should go where. There are often gaps in consistency and assets are stored in a variety of systems and formats. Organizing for scale and strategic application is rarely considered upfront, and it’s common to find lots of “technical debt” accrued over the years because hacked solutions were put in place to save time and money.

Getting organized and understanding what data you have is even more critical in the era of the General Data Protection Regulation (GDPR) and increasing consumer skepticism about digital advertising and targeting due to repeated scandals. I personally think there will continue to be a healthy market in the future for appropriately collected and transparently sourced data sets, but most publishers are nowhere near ready to understand their “pile” from either an opportunity or risk perspective.

Suppose you are a publisher with a large first-party subscriber and identity data base. How easily could you use that data to target your next mobile app campaign? It would probably be incredibly useful to a marketer client, but I suspect most publishers would shrug sheepishly if they had to explain how to pull that off from both an operational and compliance perspective.

What if you had a large and accurate historical location database? Could you easily pass it to an out-of-home vendor? Again, tons of opportunity there, but could you find what you need in your “pile” and successfully ship it to a client or partner?

If a key asset is in a “pile” somewhere, it is safe to assume that it is not being fully leveraged by the business.


At this point, I feel that the word “data” has become so overused that it has lost any agreed-upon meaning. The word data can refer to everything from clicks on a banner ad to cookies in a browser to location pings from an app to mobile IDFAs to you name it. The point here is that “data” is a non-specific word that doesn’t help anyone make progress.

Understanding what specific data sets publishers may own or produce and how to leverage them is critical. If the publisher has a sense of what is being produced and how that process happens, it suddenly becomes possible to consider ways to actually grow this valuable asset and then develop and deploy a compliant and beneficial strategy.

At the same time, it is also important not to think in a vacuum. When assessing their first-party data sets, publishers must be knowledgeable of the broader data market and where they fit in. It is critical to know where their data will be competitive and where it will be complementary. How will they differentiate? Who will be a rival and who could be a partner? Is your data personally identifiable information? If so, is it managed in a privacy-safe way?

Finally, it is critical to know what business partners and potential competitors are doing. Is there a hole in the ship somewhere leading to leakage? Maybe there is an untapped revenue opportunity that is being missed because of an overabundance of caution. Maybe there is a rapidly growing risk pool because regulations are not properly understood.

Is there hope?

So why I am beating up on poor beleaguered publishers who are just trying to look good on panels? Because, in my opinion, this area is too important to ignore, and a serious wake-up call is needed if publishers hope to be successful in our ever-more-automated and data-driven market. If you are a publisher and a large portion of your revenue comes from selling ad impressions, it is imperative that your organization has skills and knowledge in using data and a strategy for its application to your business.

That means the first-party data that the business produces and owns, the second- or third-party data that it may buy or leverage and an understanding of the various data sets used by partners and possibly competitors that may impact how the publisher’s impressions are bought. Despite the important impact of GDPR, I believe there will be very little – if any – future market for untargeted impressions with no data attached to them. And if you are selling impressions in to an automated marketplace and not managing the data and pricing that is going with those impressions, you are likely being arbitraged bigly.

So, what should publishers be doing here? As a first step, here is my suggested substitute sentence for future panels:

“We have spent years actively collecting and curating a large and unique audience/location/other data set. Because we have thought about its utility and the regulatory environment, we have made technology decisions that support our strategy. Now we are able to activate it in a compliant way internally across our business, with our marketer partners, with our tech partners and with other publishers.”

No more sitting around. No more messy piles. And no more vague references.

Take action. Know the rules. Get organized. Figure out how this key asset is going to drive your business.

Follow Jeremy Hlavacek (@jhlava), IBM Watson Advertising (@watsonads) and AdExchanger (@adexchanger) on Twitter.

Enjoying this content?

Sign up to be an AdExchanger Member today and get unlimited access to articles like this, plus proprietary data and research, conference discounts, on-demand access to event content, and more!

Join Today!

1 Comment

  1. Jeremy- Great Article and excellent point. Can I also suggest we stop using the phrase “Deterministic Targeting” as it suggests that the representative data is or has been determined to be accurate as we all know it is not true. In fact what is claimed to be “Deterministic” is at best probabilistic and often has a lower real probability than what is being sold as “Probabilistic”. It does a great disservice to the industry when we misrepresent, for market advantage, to the marketer and to the public that we “know” so much more than we do.