Getting Practical – But Not Personal – With Differential Privacy

Being able to share information about a group of people without compromising any individual person’s privacy kinda sounds like a form of wizardry.

But it’s not. It’s just math.

I say “just” not to downplay how technical the process is, but rather to highlight that privacy-enhancing technologies (PETs) – like differential privacy, as described above – are no longer abstract. They’ve entered the mainstream.

“Differential privacy went from a theoretical concept, something you’d think about in grad school, to ‘Oh, OK, I can get a job doing this outside of just being a professor,” said Ryan Rogers, a mathematician who’s now on his second stint at Apple.

Rogers rejoined Apple in March as a machine learning researcher with a focus on data and privacy after a little more than five years at LinkedIn as tech lead on differential privacy (DP).

His career trajectory – from an applied math and computational sciences PhD at UPenn to Apple to LinkedIn and back to Apple again – represents the broader shift that PETs have made from the classroom into practical research labs across Silicon Valley.

PET projects

Over the past eight or nine years, all the big platforms have invested in operationalizing differential privacy: Apple, Facebook, Google, Snap, LinkedIn and even TikTok.

Apple, for example, uses DP to draw inferences about user behavior on its devices to power its recommendations, like which emojis are trending or new popular words appearing in texts.
Facebook used it to make large data sets available to researchers in 2020 who were studying what impact sharing misinformation has on elections.
Google has used it to support some of the APIs in the Chrome Privacy Sandbox and to gather data for training neural networks.

Meanwhile, LinkedIn has been using differential privacy to measure the real-time performance of posts. This method allows content creators on the platform to see what’s resonating with certain audiences but without getting access to any specific demographic information, such as who exactly saw a post.

The goal in each of these cases is to strike the often-tricky balance between privacy and utility.

“There’s been a realization in the industry that things we’ve done in the past, like aggregation, might no longer be sufficient for protecting privacy,” Rogers said.

Not that differential privacy – or any PET, for that matter – can achieve perfection. Perfect privacy is only possible if no data is shared at all, and if nothing is shared, there is no utility.

“Think of differential privacy as existing on a spectrum,” Rogers said.

In other words, there has to be a tradeoff.

Adding more statistical noise or randomness to a data set means the privacy guarantee is stronger, but the output will likely be less accurate, and vice versa. The ratio depends on your risk tolerance level, what you’re trying to achieve and the sensitivity of the data set in question.

PEDAL to the privacy

LinkedIn’s investment in differential privacy for post analytics was about being proactive rather than reactive to risk.

One logical way to protect the privacy of someone who views a post is to only share aggregated information with the post’s author, like the top job title among viewers or a company name.

But LinkedIn’s applied research team wondered whether it would be possible for a bad-acting author to combine that information and monitor real-time updates to profiles on LinkedIn as a way to identify exactly who engaged with a post.

Although LinkedIn had never seen an attack like that happen in the wild, the team, helmed by Rogers at the time, decided to dig in and find out whether a risk really existed.

And, apparently, it did. They discovered it was technically possible to identify around 9% of post viewers using a small amount of demographic information.

The upshot of this research was the development and release of a privacy tool late last year called PEDAL, which stands for Privacy-Enhanced Data Analytics Layer.

If you’re a data scientist or some other variety of math-minded brainiac, you can dive into the details here. But I’m neither, so, in short, what PEDAL does is to apply multiple differential privacy algorithms to inject noise into event-level data before it’s shared with LinkedIn’s analytics platform.

The upshot is that the people viewing LinkedIn posts can’t be identified – but the person posting them can still get useful analytics instantly. Balance = struck.

“With differential privacy, you can still get useful insights from data without revealing anything at the individual level,” Rogers said. “The point here is to be as practical as possible.”

🙏 Thanks for reading (wherever you happen to be doing so; this is a judgment-free zone)! As always, feel free to drop me a line at allison@adexchanger.com with any comments or feedback.

Tagged in: