“The Sell Sider” is a column written for the sell side of the digital media community.
Today’s column is written by Michael Manoochehri, chief technology officer at Switchboard Software.
Why do data projects continue to fail at an alarming rate?
Gartner estimates that through 2017, 60% of data projects never get past preliminary stages.
There are common reasons these projects struggle, but when it comes to data, the advertising industry is anything but common.
Our industry is experiencing a historic surge in data volume. Complexity is growing due to the success of programmatic advertising, and publishers are demanding access to larger numbers of data sources.
From a data perspective, I believe there are three main barriers that prevent publishers from maximizing the value of their data, and it’s time we start talking about them. While there’s no quick fix to solve these problems, there are practical steps publishers can start taking now to prevent their data from losing value.
Obtaining a unified view of how content relates to revenue requires a combination of multiple, programmatic data sources. Most supply-side platforms (SSPs) provide reporting data in some fashion, but there remains a daunting lack of standard granularity in available data.
Some SSPs provide data broken out to the individual impression, while others only provide daily aggregates. Granularity mismatches become an even greater challenge when each source generates different versions of currencies, date formats, time formats, bid rate formats and so on. These differences add up fast, and when they do, the inability to build a unified view of all data lowers its overall value.
To solve data granularity issues, publishers must apply business rules to normalize their data. These rules can be defined by options in a vendor's UI, SQL code in the data warehouse or pipeline code by an engineer. Business rules describe how data “should” look – bid rates as decimal values versus percentages, for example – from a given source. Lack of visibility into where business rules are defined can cause costly problems.
Time and time again, I’ve observed engineering teams debate with C-level executives about inaccuracies in revenue reporting. The reason is often because somewhere along the data supply chain, a business rule changed in an undetectable way. To prevent granularity from becoming an issue, there must be transparency for business rules.
To get started, I suggest going through the exercise of simply accounting for all steps in the data supply chain to document how rules are being applied, and by whom. Knowing how business rules are used for normalization is essential to preserving the value of the data.
SSPs often promote how accessible their data is through application programming interfaces (APIs), but sometimes that’s not the reality.
Publishers and advertisers rely on a network of multiple, heterogeneous data sources, but many are completely unprepared for the rate of change, problems and quirks exhibited by SSP APIs. SSP APIs can and will change or break. As the number of APIs under management grow, the possible points of failure multiply, creating a distributed systems problem. Laws like the General Data Protection Regulation also require that API credentials and data access are managed securely using best available practices. Ultimately, any time a team can’t contend with errors or downtime from a given source, its data loses value.
I've met many talented engineering teams who struggle to understand how much of their API-based reporting works, so it should come as no surprise that line-of-business leaders often feel in the dark as well. To prevent API challenges from becoming too daunting, I suggest that engineering teams proactively develop both processes to monitor API health and data operations playbooks to react to changes and outages. They should ensure that API credentials are not tied to user accounts and that they are stored with secure password management software.
Playbooks are crucial for engineering to diagnose problems and for the line-of-business manager to understand what’s going on. They also serve as excellent handover documentation if there is engineering churn.
Advertising data volume is exploding, and heterogeneous data sources are proliferating. This one-two punch puts up a tough scalability barrier that’s difficult to fight through.
Ultimately, delivering scalability comes down to smart infrastructure planning. However, many businesses think until they need to scale, there’s no work to be done, which is a dangerous mistake. Keep in mind that not all infrastructure products – however helpful they may be – are suitable for scaling past a certain size. Scale problems hit unexpectedly, and when they happen, queries crawl to a halt, dashboards fail and data is lost.
To prepare for scalability challenges, publishers should start measuring now. Specifically, they must understand how much data they have, how queries are performing and how responsive their dashboards are. Can their infrastructures handle unexpected spikes in volume, whatever those spikes might look like? They should walk through a use case of what they would do, should their data exceed current capacity.
Publishers should also know that there are significant performance differences between standard on-premise databases and cloud data warehouses. In my experience, once the data required for daily analysis approaches 10 GB, standard databases become slow and costly to maintain. Publishers must understand the tradeoffs when they eventually need to migrate to new infrastructure.
One final thought: Keep in mind that breaking through these barriers will always require some engineering. Publishers should try to get proactive about aligning the goals of engineering and business teams. The more closely aligned they are, the faster they’ll get more value from their data.