首页> 外文会议>IEEE Symposium on Security and Privacy >KHyperLogLog: Estimating Reidentifiability and Joinability of Large Data at Scale
【24h】

KHyperLogLog: Estimating Reidentifiability and Joinability of Large Data at Scale

机译:KHyperLogLog:大规模估计大数据的可识别性和可连接性

获取原文

摘要

Understanding the privacy relevant characteristics of data sets, such as reidentifiability and joinability, is crucial for data governance, yet can be difficult for large data sets. While computing the data characteristics by brute force is straightforward, the scale of systems and data collected by large organizations demands an efficient approach. We present KHyperLogLog (KHLL), an algorithm based on approximate counting techniques that can estimate the reidentifiability and joinability risks of very large databases using linear runtime and minimal memory. KHLL enables one to measure reidentifiability of data quantitatively, rather than based on expert judgement or manual reviews. Meanwhile, joinability analysis using KHLL helps ensure the separation of pseudonymous and identified data sets. We describe how organizations can use KHLL to improve protection of user privacy. The efficiency of KHLL allows one to schedule periodic analyses that detect any deviations from the expected risks over time as a regression test for privacy. We validate the performance and accuracy of KHLL through experiments using proprietary and publicly available data sets.
机译:了解数据集的隐私相关特征(例如可重新标识性和可连接性)对于数据治理至关重要,但对于大型数据集则可能很难。尽管通过蛮力计算数据特征非常简单,但是大型组织收集的系统和数据的规模要求一种有效的方法。我们提出了KHyperLogLog(KHLL),这是一种基于近似计数技术的算法,可以使用线性运行时和最少的内存来估计超大型数据库的可重识别性和可连接性风险。 KHLL使人们能够定量地测量数据的可识别性,而不是基于专家的判断或人工检查。同时,使用KHLL进行的可连接性分析有助于确保分离匿名数据集和已识别数据集。我们描述了组织如何使用KHLL来改善对用户隐私的保护。 KHLL的效率使您可以安排定期分析,以检测随时间推移与预期风险的任何偏差,以此作为隐私回归测试。我们通过使用专有和公开可用的数据集进行的实验来验证KHLL的性能和准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号