Home The Sell Sider Don’t Drown In Your Data Lake

Don’t Drown In Your Data Lake

SHARE:

The Sell Sider” is a column written by the sell side of the digital media community.

Today’s column is written by James Curran, founder and chief product officer at Staq.

Publishers are starting to ramp up their investment in data, especially as they gain more prowess in programmatic advertising. Many are throwing around the idea of creating their own data warehouse or have already started the process.

Data is currency. Amazon uses data to give users a more relevant retail experience. Google uses data to organize all information on the internet. Facebook and Twitter use data to map social behavior and interests to give users a more engaging social feed.

Publishers need to be careful, not just because everyone with a data lake today needs to ensure they have a license to fish from it and also invite others to, but because it can be overwhelmingly costly if they’re not ready for the process.

While publishers might be able to handle the rolled-up reports they get from their advertising and data partners, collecting log-level data and storing it is exponentially more complicated. Before they get in too deep, publishers must determine what they want to use data for, if they have the resources to collect, review and manage that data, and if they can analyze it well enough for an investment in a warehouse to be worthwhile.

Otherwise, their data lake ends up with quicksand at the bottom.

Focus on the goal

Data can be currency, but every data point isn’t valuable enough to keep. Thinking that you should simply collect and park every data point that comes into an organization is a bad idea. Storing it will be expensive. Publishers will have trouble squeezing insights from a huge data set. And they run a bigger security risk.

Instead, publishers should start with a clear goal. Even better, don’t move forward on collecting log-level data until they’re sure they can answer at least their top three most burning questions.

They might have a goal to normalize pricing across inventory based on advertiser bidding patterns. Or they may be interested in understanding the market value of various pieces of advertising inventory on their website cross-analyzed with different audience groups. So they’d need a solution that merges data from their data management platform and their ad servers. Every business problem is different, and so every data warehouse should look different.

Publisher will need a plan that allows them to identify and roll up the data they’ll be looking at every day. For example, do they want to see programmatic bids by page or content? Clearing price by advertiser or exchange? Fluctuations over a specific period?

Keep your head out of the clouds, even if the data is in the cloud

With a plan in place publishers need a location. Think of a data warehouse like a real brick-and-mortar building that will store stuff. Publishers need a clean, safe and secure storage facility. They need to be able to grant access to certain people and restrict other people. Trucks need to be able to pull up and drop data off at regular times, and publishers need to find a place to put incoming data that’s organized and works with what’s already there.

For publishers with limited resources, these responsibilities might stretch beyond the reasonable limits of their organization. Don’t let developer hubris get in the way of a prudent decision. They probably do not need their own servers or room in a custom data center.

Amazon, Microsoft Azure or Google Cloud will likely end up being the best partner because they are a relatively full service, and that’s OK. The most important points to cover are that the data is secure, organized, accessible and can accommodate the influx of new data without becoming unmanageable.

The opposite of ‘set it and forget it’

Speaking of unmanageable, log-level data has a habit of spinning out of control and requires a lot more than an organized warehouse to keep it in shape. Publishers also need personnel. Taking the programmatic example, a typical publisher pulls data in from 10, 20 or 30 different data sources per day. And every day, there are errors in that data.

As for internal data, a publisher often manages a collection of many websites. Some publishers are organized into a collection of many publishers. Depending on the organization, each site and publisher may have totally different content and ad placement architecture. Key value pairs, inventory organization and other important elements will need to be constantly checked and reconciled as people update and add to the different parts of the business.

Publisher will need to have the resources to address errors within millions or billions of data points from partners before they simply back up the truck and dump the info into the warehouse. On top of that, APIs stop working, field names change, partners change their policies and publishers need to be on top of every minute change or they will fall victim to the “garbage in, garbage out” problem.

At that point, the entire warehouse is compromised before they get a single insight from it. These labor costs are well understood in the world of physical storage, but are often dangerously neglected in the world of data, even as analysts are actively trying to look for answers.

I know of one publisher that put all warehouse management responsibility on a single person. When that person left the company, their data warehouse sat unattended as tons of data piled up. Their storage costs and risk piled up, too. It was several months before the finance department noticed the mounting costs and figured out where they were coming from.

The moral of the story is that before publishers get to the analysis phase, collecting and storing data is complicated. It requires a plan and goals, organization and oversight. Otherwise, all valuable insights will be sucked into the quicksand at the bottom of the lake.

Follow James Curran (@james_curran), Staq (@STAQ) and AdExchanger (@adexchanger) on Twitter.

Tagged in:

Must Read

Fox Announces Plans To Acquire Roku For $22 Billion

It’s long felt like a foregone conclusion that Roku would eventually get gobbled up by a much bigger fish. Now, the day has finally arrived.

What Platforms Say Will Bring Bigger Ad Budgets To Digital Audio

To close the gap between digital audio ad spend and audience engagement, audio platforms want to get more deeply embedded in omnichannel campaign planning tools.

AdExchanger's Big Story podcast with journalistic insights on advertising, marketing and ad tech

Programmatic TV Home Screens And Gaming Ads For Kids

How can companies put ads in new places without hurting the user experience? Smart TV makers, like Samsung, are adding programmatic ads to the home screen, and Roblox will now show ads to users under 13. We examine the trade-offs as platforms expand their ad footprint.

Privacy! Commerce! Connected TV! Read all about it. Subscribe to AdExchanger Newsletters

This AI 'Brain' Wants To Get Rid Of The Grunt Work In Creative Campaigns

Innovid’s latest offering serves as the “brain” behind a company’s orchestration layer. Optimum says it reduces manual work and cuts down on execution time.

multiple sets of eyes

Amazon DSP Adds Adelaide’s Pre-Bid Attention Targeting

Advertisers can target high- and medium-attention ad inventory in Amazon DSP while filtering out low-attention placements and made-for-advertising sites.

Marketers Are Getting Used To AI In The Ad Stack

Marketers and media buyers are gradually getting more comfortable talking about ad campaigns they’re testing on large-language models like OpenAI’s ChatGPT.