"Data-Driven Thinking" is written by members of the media community and contains fresh ideas on the digital revolution in media.
Today’s column is written by Tom Weiss, chief technology officer and chief data scientist at Dativa.
While we're still largely confined to our homes, frequently glued to our screens, ads supporting the bulk of online content continue to flow, and data pipelines are still running. The ad tech world relies on data science technologies to make these systems work, even when no one is at the office.
These platforms can vary in complexity for implementation, integrations, tools and features, and they provide ad tech’s backbone by marrying, managing, visualizing and applying data from disparate sources so marketing and media professionals have accurate insights and metrics to steer their decisions. For advertisers across the digital and media landscape who rely on ad tech to source, manage and measure performance for ad buys, the reliability of these systems depends on the underlying solutions and data science teams enabling them.
In my work helping companies use data science to measure media and ad performance, I’ve had the chance to work with many platforms for different use cases on behalf of customers. Based on their ease-of-use, robust sets of tools and capabilities and support even when most businesses are shuttered, here are five of my favorite data science upstarts helping keep our industry moving today.
The company's founders were some of the original creators of Apache Spark, a lightning-fast unified analytics engine for big data and machine learning, which Databricks extends.
Nobody knows the underlying technology better. The platform smartly packages Spark environments for Amazon Web Services or any other cloud, with stable underlying data structures and Python workbooks for users.
Databricks is increasingly used by organizations across the media industry. Comcast, for example, used Databricks to optimize its machine learning to support personalized experiences and voice controls for customers, all while optimizing operations and cross-team collaboration.
Expect to see more for its security and performance advantages over vanilla AWS solutions, as privacy remains top-of-mind today for media, advertising and marketing professionals who want to maximize the use of data without compromising the privacy of their customers and audiences.
2. Sisense for Cloud Data Teams (formerly Periscope Data)
Founded by Google and Microsoft veterans, Periscope Data was recently acquired by Sisense and renamed Sisense for Cloud Data Teams.
One of my favorite tools, it offers amazing dashboards and an analysis alternative for those tired of Tableau because of its use of opaque algorithms and limited analysis controls for users. I originally came across Periscope Data because of the better licensing terms for multiple users but have stuck with it because it's so easy to turn SQL into dashboards.
It provides a super fast visualization tool to make sense of raw data from sources ranging from MySQL and PostgreSQL to Salesforce, among others, to create charts in just a few steps. It's a powerful way to augment a team's capabilities, connecting data from different areas of your business to see impact from advertising and marketing budgets.
As with most of the companies on this list, the bulk of analytics tools are not exclusive to ad tech and are helping make sense of data across industries. In the wake of COVID-19, for example, Sisense CEO Amir Orad posted a note from a customer on LinkedIn who said they were using its solution to support clinical AI capabilities in reaching high-risk patients as part of New York City's COVID-19 Rapid Response Coalition.
Snowplow’s one of the simplest ways to pipeline ad tech data in real time. Developed to help data scientists manage the collection and warehousing of data across all their platforms, the platform quickly found users in and out of the ad tech space. I like that it lets users maintain ownership of their first-party data with management and validation of data quality, flexible data structures and access to granular insights. With privacy paramount today, there's a good reason Snowplow users include tier-one media publishers and leading brands and retailers.
Snowplow's core pipeline tech is open source, making it a great option for companies looking to expand their data capability on a budget. In these turbulent times, Snowplow is working closely with its customers to help optimize and lower their cloud costs.
KNIME is open-source software with a simple graphical interface and drag-and-drop tools that can be used to pull together data processing pipelines.
It takes longer to set up than working with straight Python, but I've found that the effort’s worth it. Developed for the pharmaceutical industry, it’s gained broader popularity for its easy-to-use tools and UX that makes machine learning more accessible. It’s particularly useful when working in a context where models need to be continuously updated, and the users aren't actual machine learning experts. In the media space, we’ve seen companies deploy KNIME to implement a home-grown recommendation engine for DTC streaming services.
The platform has even be used to visualize the growth of COVID-19 infection and impact of containment measures on flattening the curve by region to predict trajectories for regions following suit.
Starburst provides an abstraction layer so users can run data warehouse analytics by accessing numerous data sources without first compiling it in one location. It's a bit like what Databricks is doing with Spark though it's more accessible for marketers and business professionals who are fluent in SQL and not implementing machine learning.
For marketers in particular, it’s proven to be an exceptional tool for running analytics and creating reports on the fly based on disparate data sources. It's also easy to adjust sources as they change or grow without the need to start from scratch each time.
In response to the current crisis, it recently announced licenses for Starburst Enterprise Presto would be available for free to universities, hospitals and healthcare research organizations with teams dedicated to analyzing COVID-19 data.
For the advertising industry, these upstarts are helping organizations make better sense of data they already have access to and putting it to work both to feed into systems and to provide insights that can help advertisers and marketers achieve better ROI on ad buys and reach more relevant customer audiences.
When evaluating these data science platforms, I’ve primarily considered whether what they are doing is sufficiently better than the tools I have today to make it worth switching from a competing solution. There's very little new in data science – it's all about making it faster, more efficient and less prone to errors – and these up-and-comers are ticking the boxes.