Big Data From Smart TVs Isn’t Enough To Measure Audiences

“On TV & Video” is a column exploring opportunities and challenges in advanced TV and video.

Today’s column is by Jonathon Wells, SVP of data science at Nielsen.

The benefits of technology are seemingly endless. We can check the security of our homes from our phones, receive grocery deliveries by drone – even drive cars that can parallel park autonomously. Our TVs are becoming equally advanced, offering content choices across an ever-growing landscape of platforms and channels.

Yet despite the many doors that smart TVs will open in the years ahead, they won’t – by themselves – provide the media industry with an accurate view of who’s using them.

Just like all connected devices, smart TVs add to a growing proliferation of user-generated data: Automatic content recognition (ACR) data is the technology that OEMs use to capture tuning on smart TVs. When combined with information that details representative, person-level behavior, these data sets significantly advance the science of audience measurement.

Given the wide adoption of smart TVs and the data they produce, it’s not surprising that an array of companies are looking to ACR data as a way to measure audiences. But it isn’t sufficient by itself to measure audiences, because it lacks the most important aspect there is in audience measurement: people.

ACR data also has a critical validation flaw: It requires the OEM to match the image on the screen with a reference image to determine what content is being displayed.

ACR technology, explained

When working as designed, ACR technology monitors the images that are projected on the TV glass and uses them to infer what content is being displayed. The served images act like a fingerprint. But after collecting the fingerprints, the technology needs to determine which network or platform the image appeared on, as well as when it appeared. To make that determination, it needs to match the image on the screen with an image contained in an OEM-maintained reference library.

There are three possible outcomes when the technology attempts to make that match:

The image matches a single entry in the library
The image matches multiple entries in the library
A matching image isn’t in the library

Clearly, the first outcome is the ideal scenario. The second scenario is less ideal, and it comes with some level of miscrediting risk, simply because of the various reasons for the multiple matches. In the third scenario, no one gets credit. The most common reason is because the content aired on a network the OEM doesn’t monitor.

AdExchanger Daily

Get our editors’ roundup delivered to your inbox every weekday.

Daily Roundup

Comic: Revenue "Sharing" (starring Marck Zuckerberg and Sundar Pichai)

Daily News Roundup

EU Probes Google Over Ad Auction Tactics (Again); Consumers Say Ads Should Pay For The News

Filling the gaps in ACR data

Even if image matching was a viable stand-alone measurement solution, it would never be feasible. To start, the cost to maintain a library of every single frame of every event on television is no small task. And that will grow exponentially. There are also no standard retention periods for images.

So how do we know the ACR technology will make the right match? Without a mechanism that can fill in the blanks, we don’t.

That’s why Nielsen has invested in watermarks, which are far more deterministic than signatures. They provide representation of all content – filling in the gaps associated with the big data by itself. That way, big data that comes from sources like ACR provide the benefit of scale in an increasingly segmented media landscape. And when we use weighting controls to calibrate big data with person-level viewing data, we are able to see comparison points that would otherwise be blank.

In a recent study, Nielsen looked to understand the degree to which these reference library gaps affect ACR tuning logs, the basis for ACR-based measurement. In a September 2021 common homes analysis, we analyzed data from our two ACR provider partners to understand where reference library gaps might factor into measurement. We looked at both the concentration of viewing sources and the viewed minutes from the available sources.

Across all sources, we found that our ACR provider partners monitor just 31% of the available stations. That means they don’t maintain data in their reference libraries for 69% of the stations.

When we looked at minutes viewed, we found that 23% of minutes came from stations that are not monitored. That means companies leveraging ACR data alone for measurement would be undercounting household-level impressions by 23%.

Supplementing data for a fuller picture

Despite the limitations of ACR data on its own, we understand the opportunity of scale and reach that it provides as an additional source of coverage.

By integrating big data sets with our viewing data, which provides representative measurement of the total US, we can significantly increase our sample sizes while applying rigorous data science methodologies to fill in the gaps and ensure fair representation of the total US audience across all networks and platforms.

Follow Nielsen (@Nielsen) and AdExchanger (@adexchanger) on Twitter.

Tagged in: