Even anonymized and aggregated consumer data may not be as anonymous as people have been led to believe, according to new academic research.
Researchers concluded that aggregated data — big batches of information on things like mobile devices’ movements, compiled for use in summarized form — can be unraveled to reveal the actual movements of specific individuals with about 73% to 91% accuracy, even from pools combining hundreds of thousands of users.
Companies that provide information to marketers about people’s dining or shopping habits, for example, typically strip the data of personally-identifiable elements like names or addresses, then aggregate it before sending it off for use.
Researchers uncovered vulnerabilities in the aggregation privacy method by analyzing mobile location counts at various points in time to tease out mini-journeys that surface in patterns in the data.
When veteran venture capitalist Adrian Colyer read the research, he described six stages of horror: “Huh? What? How? Nooooo, Oh No, Oh s*@#!”
The paper, titled “Trajectory Recovery from Ash: User Privacy Is NOT Preserved in Aggregated Mobility Data,” was published at the 26th International World Wide Web Conference last month in Perth.
“Using this approach, it turns out we can link together short journey segments pretty well,” said Colyer, a venture partner at Accel and former chief technology officer at SpringSource, VMware and Pivotal. “The final trick is to recognize that many people make very similar journeys day after day — and those journeys are unique to them. So by looking for similar journeys across days, you can link up journey segments for the same individual across days.”
When devices linger in specific locations for extended periods, it can be deduced that those locations are a home or workplace, allowing mobile location data trajectories to be uniquely distinguished. As Colyer wrote in a May 15 blog post about the research:
“To put that more plainly, given the aggregated dataset, and knowledge of your home and work locations, there’s a very good chance I can recover your full movements!!!”
Of course, it’s questionable that marketers would go through the trouble of exposing identities through such an arduous process, said privacy experts.
“The difficulty of taking an aggregated data set that we might provide … and working that backward to identify an individual is pretty high,” said Kirsten McMullen, chief privacy officer and VP of compliance at mobile ad firm 4Info. “And the reward for doing so? Pretty low and generally not worth whatever incremental value they’d get from that data, at least for the companies we sell to: agencies and brands.”
Still. There are players besides marketers that might be interested in unraveling such data.
“I can think of a lot of bad ways that data could be used by law enforcement, or very large scale tech companies or financial institutions, or by bad actors assisting in illegal activities such as helping domestic abusers locate a former partner,” McMullen added. “And to me, that’s where it really matters.”
It’s worth noting that the reason mobile carriers and countless third party tech firms are collecting and aggregating mobile data exhaust in the first place is often for commercial purposes such as marketing and advertising.
“The term anonymous aggregate data needs to be scrubbed from our vocabulary until techniques or tech comes along that will fix this,” said Pam Dixon, founder and executive director of World Privacy Forum. “In general though, there is just too much data and too much ability to predict now.”