Estimating the success of re-identifications in incomplete datasets using generative models

Luc Rocher; Julien M. Hendrickx; Yves-Alexandre de Montjoye

首页> 外文期刊>Nature Communications >Estimating the success of re-identifications in incomplete datasets using generative models

【24h】

Estimating the success of re-identifications in incomplete datasets using generative models

机译：使用生成模型估算不完整数据集中重新标识的成功

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

While rich medical, behavioral, and socio-demographic data are key to modern data-driven research, their collection and use raise legitimate privacy concerns. Anonymizing datasets through de-identification and sampling before sharing them has been the main tool used to address those concerns. We here propose a generative copula-based method that can accurately estimate the likelihood of a specific person to be correctly re-identified, even in a heavily incomplete dataset. On 210 populations, our method obtains AUC scores for predicting individual uniqueness ranging from 0.84 to 0.97, with low false-discovery rate. Using our model, we find that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes. Our results suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model.

机译：虽然丰富的医疗，行为和社会人口统计数据是现代数据驱动研究的关键，但它们的收集和使用提出了合法的隐私问题。通过去识别和采样在共享之前匿名的数据集是用于解决这些问题的主要工具。我们在这里提出了一种基于生成的Copula的方法，即使在大量不完整的数据集中，也可以准确估计特定人员被正确重新识别的可能性。在210个群体中，我们的方法获得AUC分数，用于预测单个唯一性，范围为0.84至0.97，具有低假发现率。使用我们的模型，我们发现，使用15个人口统计属性，在任何数据集中将在任何数据集中正确地重新识别99.98％。我们的结果表明，即使是严重采样的匿名数据集也不太可能满足GDPR并严重挑战去识别释放模型的技术和法律充分性的现代标准。

著录项

来源
《Nature Communications》 |2019年第1期|共9页
作者
Luc Rocher; Julien M. Hendrickx; Yves-Alexandre de Montjoye;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An Empirical Analysis of KDE-based Generative Models on Small Datasets [J] . Ekaterina Plesovskaya, Sergey Ivanov Procedia Computer Science . 2021,第a期

机译：小型数据集基于KDE的生成模型的实证分析
2. Interactive Curation of Datasets for Training and Refining Generative Models [J] . Ye Wenjie, Dong Yue, Peers Pieter Computer Graphics Forum: Journal of the European Association for Computer Graphics . 2019,第7期

机译：用于培训和精炼成型模型的数据集交互式策策
3. Generative models and abstractions for large-scale neuroanatomy datasets [J] . Rolnick David, Dyer Eva L. Current Opinion in Neurobiology . 2019,第期

机译：大规模神经瘤性数据集的生成模型和抽象
4. A Decision Tree Regression Modeling Scheme for Estimating the PVT Properties of Kuwaiti Crude Oil Systems Using Incomplete Datasets [C] . Meshal Almashan, Yoshiaki Narusue, Hiroyuki Morikawa Abu Dhabi International Petroleum Exhibition Conference . 2019

机译：一种决策树回归建模方案，用于使用不完整的数据集估算科威特原油系统的PVT性质
5. Carbon monoxide source estimates: Multiple satellite datasets and high resolution adjoint inverse model. [D] . Kopacz, Monika. 2009

机译：一氧化碳源估算：多个卫星数据集和高分辨率伴随逆模型。
6. Estimating the success of re-identifications in incomplete datasets using generative models [O] . Luc Rocher, Julien M. Hendrickx, Yves-Alexandre de Montjoye -1

机译：使用生成模型估计不完整数据集中重新识别的成功
7. Estimating the success of re-identifications in incomplete datasets using generative models [O] . Luc Rocher, Julien M. Hendrickx, Yves-Alexandre de Montjoye 2019

机译：使用生成模型估算不完整数据集中重新标识的成功

Estimating the success of re-identifications in incomplete datasets using generative models

摘要

著录项

相似文献

相关主题

期刊订阅