首页> 外文会议>Annual conference on Neural Information Processing Systems >What do row and column marginals reveal about your dataset?
【24h】

What do row and column marginals reveal about your dataset?

机译:行和列边际揭示你的数据集是什么?

获取原文

摘要

Numerous datasets ranging from group memberships within social networks to purchase histories on e-commerce sites are represented by binary matrices. While this data is often either proprietary or sensitive, aggregated data, notably row and column marginals, is often viewed as much less sensitive, and may be furnished for analysis. Here, we investigate how these data can be exploited to make inferences about the underlying matrix H. Instead of assuming a generative model for H, we view the input marginals as constraints on the dataspace of possible realizations of H and compute the probability density function of particular entries H(i, j) of interest. We do this for all the cells of H simultaneously, without generating realizations, but rather via implicitly sampling the datasets that satisfy the input marginals. The end result is an efficient algorithm with asymptotic running time the same as that required by standard sampling techniques to generate a single dataset from the same dataspace. Our experimental evaluation demonstrates the efficiency and the efficacy of our framework in multiple settings.
机译:众多数据集根据社交网络中的组成员资格来购买电子商务站点上的历史,由二进制矩阵表示。虽然该数据通常是专有的或敏感的,但是聚合数据,显着的行和柱边缘,通常被视为更不敏感,并且可以为分析提供。在这里,我们研究了如何利用这些数据来对底层矩阵H进行推断。而不是假设H的生成模型,我们将输入边缘视为关于H的可能实现的数据的约束,并计算概率密度函数特定条目H(i,j)的兴趣。我们同时为所有单元格的单元格,而不会生成实现,而是通过隐式采样满足输入边缘的数据集。最终结果是一种高效的算法,具有渐近运行时间的算法,与标准采样技术相同,以从同一数据空间生成单个数据集。我们的实验评估展示了我们多种设置中框架的效率和功效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号