Data Science Sanity Check: How To Ask The Right Question

khoapham“Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media.

Today’s column is written by Khoa Pham, product and data analyst at PlaceIQ.

How many golf balls can you fit in a 747?

This simple, albeit somewhat ridiculous, question digs right into the heart of data analysis and is the bane of recent college grads looking for employment in any quantitative field.

How many hairs are there on a human head? How many piano tuners are there in the city of Chicago? How many words have been spoken since the beginning of time?

These questions test the way we go from a series of assumptions about things we have limited information about to coming up with an answer we can’t immediately test.

This type of estimation, called a Fermi question, is often used in engineering and sciences to scope a problem before attempting to build a complex model to derive a more precise answer. This more precise answer is often not needed and expending time and energy to calculate the answer can hurt the bottom line of the business.

This couldn’t be truer in data science, where money can disappear into black holes of “data analysis.” Having complex systems in place to compute forecasts and project inventory does not mean they should be used for every question that arises.

The influx of interest in mobile advertising, coupled with the recent infusion of “Big Data” into the ecosystem, has brought these types of Fermi questions to the forefront of an analyst’s day-to-day job.

How Many Impressions Can You Fit In A Phone?

As a complete newcomer to the world of mobile ads, I was faced with this very question on a daily basis. When dealing with millions or even tens of millions of impressions across a several-month period in an RFP, it is easy to get lost in these large numbers.

The progression from ideas in a meeting room to actual targetable line items that must deliver is a dark and scary road. It also doesn’t help that the names and descriptions of many segments are often mini-riddles in and of themselves.

Furthermore, what may make sense in aggregate may also break down when splitting the lump-sum number into individual line items.

It’s hard to hit a moving target.

Learning from Mistakes

Late last year, I was tasked with providing a geospatial targeting solution for a one-month campaign with a large financial company. They wanted to target coffee shops in Birmingham, Ala. Simple enough. The IO came in rushed, as it typically does, and the specific audience was not prebuilt. It fell on me to do a quick estimate on the number of impressions that can be served.

So, how many impressions can you serve to coffee shops in Birmingham, Ala., in one month on a single, fairly large publisher?

• There are about 1 million people in the Birmingham Metropolitan Statistical Area.

• I assumed two-thirds are in the targetable range of 18-65 years old, totaling 667,000 people.

• If around one-third have smartphones, that equals 200,000 people.

• Of that group, about one-fifth have one of the publisher’s apps, totaling 40,000 people.

• Assumed half of these people are coffee drinkers, or 20,000 people.

• Coffee drinkers go to coffee shops 12 times a month and will have the app open three of the 12 times.

• The final result: About 60,000 impressions can be served per month to this line item.

The client initially wanted 10 times the number of impressions but we told them we just couldn’t do it.

It’s More Art Than A Science

As marketers become increasingly obsessed with creating more unique and personable targeting tactics, it is easy for logic to get lost in the shuffle. It oftentimes falls to the analyst to make sure the sanity checks are there and to transform the idea into something that is targetable and scalable.

Some key things I’ve learned:

• The first step towards a successful campaign is being able to scope and size an audience.

• Nail down exactly what the client wants and the key metrics by which they are judging your performance.

• Formalize any assumptions you have made and make sure the client understands these risks.

• Managing the client is as important, if not more important, than managing your data.

As any salesman will tell you, getting any and all of the above from the client is easier said than done.

At the end of the day, it is the collaboration between the clients, agencies and advertisers that will overcome this hurdle of creating something new and meaningful. Not everything can be precomputed, forecasted and load-tested. This is true especially in our fast-paced, rapidly evolving industry.

It is OK to be wrong (sometimes).

As more and more people flock to mobile and expectations are set sky-high, analysts need to be the gatekeepers between what is desired and what can be done.

Follow (Khoa Pham) (@PhamTKhoa), PlaceIQ (@PlaceIQ) and AdExchanger (@adexchanger) on Twitter.

Enjoying this content?

Sign up to be an AdExchanger Member today and get unlimited access to articles like this, plus proprietary data and research, conference discounts, on-demand access to event content, and more!

Join Today!


  1. Great insights, thank you. As having been in behavioral targeting technology for over six years, part of the seller’s guidance must be to prevent “Filter Fatigue.” Just because you can filter, doesn’t mean you should use all of them. Every time you ad a rule, you will likely be splicing down the size of the audience. The biggest culprit is age range. Unless the product is specified to a niche group (ex/ alchohol must have 21+ in place)—you can shut out users who are on the fringes of the targeting pool. Would an advertiser reject a sale from a 45 year old woman, when the RFP stated it was to be 25-44 year olds? Add that registered data and you’ll lose potential sales, new customers and impressions. So just think about age ranges carefully.

  2. Oh my goodness. You actually just wrote this? Thank you, Khoa!

    What is frustratingly obvious to many of us is so often ignored by buyers who fall victim to the glamour of another sales rep’s pitch. “We want to geo-fence this business and only run in news apps. We need 4,000,000 impressions this month and this other vendor said it was no problem. So, if you can’t do it, your tech isn’t as good as theirs.”

    Doing mysterious math in the name of making a quick sale is a real problem in our industry and IMHO is as wasteful of advertisers’ dollars as any other challenge facing our industry today. Thank you for speaking out on this on behalf of a company that could grab short-term gains from fuzzy math but obviously isn’t!