How To Evaluate A Data Science Project (Without Knowing Data Science)

“Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Ellen Houston, managing director of applied data science at Civis Analytics.

Chances are, over the last few months of the COVID-19 crisis, your analytics team has been hard at work trying to understand what’s happening to consumer behavior. It’s an important part of your recovery strategy, but it’s not easy. The pandemic has had a significant impact on people’s perspectives, fears, and behaviors. Trends are changing and the data you once knew to be predictive has hurdled into the world of the unknown. If you’re not a numbers person by training, how do you evaluate the methodology and results from analytics teams’ projects, and decide if you have enough faith in the findings to act?

As a career marketer who has steadily integrated into the analytics space, I have often been in situations like this. It can be intimidating, but if you ask the right questions and understand a few simple concepts, you’ll feel more confident and equipped to partner with analytics teams.

Perhaps more importantly, you’ll be able to call BS when you need to.

Let’s use some of the typical questions companies are facing right now to illustrate how to better understand the ins and outs of the project work without going back for a master’s degree. This usually involves a few steps: Get the data in one place, build and refine a model and then share the results.

What’s wrong with the data?

Any data scientist will tell you: No data set is ever going to be perfect. That’s not to say that imperfect data is bad or useless. You just need to understand how it’s imperfect and what impact that flaw could have on conclusions. For example, in the past maybe you only used data from your loyalty program. That behavior is going to be different (and likely more positive) than your broader consumer base or the general market.

Today, a common question is: How well will my pre-pandemic data help me understand future performance? The answer is there is still a lot of uncertainty around this. However, to limit that uncertainty you can do a few things.

First, be willing to take a few calculated risks. Test an offer or message in the market with a subset of your customers to understand how their performance on previous similar offers compares, to get an idea of the magnitude of impact.

Second, begin to understand how your customer base has changed. This can be through surveys or profiling key geographic areas with publicly available data. The goal is to understand how the pandemic has impacted your customers, not just all US consumers.

Was economic hardship higher or lower than national averages? Are you concentrated in any areas more or less impacted by the pandemic? This will provide greater insight into what your customers need from your organization.

What are you modeling and how are you doing it?

A conversation that only uses the terms “AI,” “machine learning” or “proprietary algorithm” should raise eyebrows. There are lots of nuances and pros and cons to the algorithms used, but understanding some of the basics goes a long way. If a team isn’t willing or can’t explain the methodology used? That’s also a BS red flag.

First, you want to understand the exact item being predicted. For example, if someone says sales were the dependent variable, does that mean sales volume in units, sales revenue in dollars or a binary purchase of yes or no? All of these greatly impact how you can interpret the results of the model, and the more you understand what was being predicted, the more you will understand how to apply the modeled results.

Second, you want to understand what type of model was used. Again, you don’t need to know everything about random forest vs. logistic regressions, but your team should be able to clearly explain what model they used and why. Ask about the pros and cons of the model in terms of accuracy and interpretability, and have them walk you through some of the important features and the process they used to improve the model in plain English. Hiding behind jargon and highly specialized terms will not help anyone in the room push together toward the best result.

Finally, have an open discussion. Learn what the team felt went well or what other data sources they wish they had. Ask what limitations they see. Much like data, no model is perfect. An open and productive discussion enables you to fully understand current limitations, so you can make the most informed decisions based on the results.

What are the results and recommendations?

Ok, you understand the data. You walked through the model, and now you need to understand how to act on the results.

Make sure you understand what part of the work led to the recommendations, and if you implement them how it can be measured. Analytics is an ongoing process, and it’s important to iterate and know that sometimes it’ll be right, and sometimes you’ll need to learn.

Be an inquisitive consumer of the readout. Make sure you understand the variety of charts and graphs available. Do all the axes start at zero? Can you explain back and understand the conclusions on a slide?

Analytics can challenge conventional wisdom, and help you be more precise – but don’t ignore your gut. If the results are entirely inconsistent with past behavior, the analytics team should help you understand why. If they can’t, that’s a problem.

Be wary of intentionally or unintentionally misleading statistics. For example, percentages or indexes without an “n” size may make results look more extreme than they actually are. Additionally, don’t forget that correlation isn’t the same as causation, and focusing too heavily on directional findings can lead you to ignore potential confounding factors or fall victim to spurious correlation.

When you evaluate an analytics project, don’t just let your eyes glaze over at the sight of complicated math and assume the recommendations are right. Take the time to ask questions, and dig into anything you don’t understand. Chances are, someone else in the room will be glad you asked.

Follow Civis Analytics (@CivisAnalytics) and AdExchanger (@adexchanger) on Twitter.

Tagged in: