首页> 外文会议>Italian Symposium on Advanced Database Systems >Data fusion with source authority and multiple truth (Discussion Paper)
【24h】

Data fusion with source authority and multiple truth (Discussion Paper)

机译:数据融合与源权限和多个真理(讨论纸)

获取原文

摘要

The abundance of data available on the Web makes more and more probable the case of finding that different sources contain (partially or completely) different values for the same item. Data Fusion is the relevant problem of discovering the true values of a data item when two entities representing it have been found and their values are different. Recent studies have shown that when, for finding the true value of an object, we rely only on majority voting, results may be wrong for up to 30% of the data items, since false values are spread very easily because data sources frequently copy from one another. Therefore, the problem must be solved by assessing the quality of the sources and giving more importance to the values coming from trusted sources. State-of-the-art Data Fusion systems define source trustworthiness on the basis of the accuracy of the provided values and on the dependence on other sources. In this paper we propose an improved algorithm for Data Fusion, that extends existing methods based on accuracy and correlation between sources by taking into account also source authority, defined on the basis of the knowledge of which sources copy from which ones. Our method has been designed to work well also in the multi-truth case, that is, when a data item can also have multiple true values. Preliminary experimental results on a multi-truth real-world dataset show that our algorithm outperforms previous state-of-the-art approaches.
机译:Web上可用的数据丰富越来越有可能找到不同源包含(部分或完全)同一项目的不同值的情况。数据融合是当发现两个代表它的两个实体并且它们的值不同时发现数据项的真实值的相关问题。最近的研究表明,当用于找到对象的真实值时,我们依靠大多数投票,结果可能是错误的最多30%的数据项,因为假值非常容易传播,因为数据源经常复制另一个。因此,必须通过评估来源的质量并更加重视来自值得信赖来源的价值来解决问题。最先进的数据融合系统基于所提供的值的准确性和对其他来源的依赖性来定义源可信度。在本文中,我们提出了一种改进的数据融合算法,它通过考虑到源代码来扩展了基于源之间的准确性和相关性的现有方法,基于该源权限来定义,这些来源是根据哪些消息来源复制的源代码。我们的方法已经设计成在多字的情况下工作,即当数据项也可以具有多个真实值时。多事实世界数据集上的初步实验结果表明,我们的算法优于以前的最先进的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号