【24h】

An Ensemble Approach for Better Truth Discovery

机译:更好地发现真相的综合方法

获取原文

摘要

Truth discovery is a hot research topic in the Big Data era, with the goal of identifying true values from the conflicting data provided by multiple sources on the same data items. Previously, many methods have been proposed to tackle this issue. However, none of the existing methods is a clear winner that consistently outperforms the others due to the varied characteristics of different methods. In addition, in some cases, an improved method may not even beat its original version as a result of the bias introduced by limited ground truths or different features of the applied datasets. To realize an approach that achieves better and robust overall performance, we propose to fully leverage the advantages of existing methods by extracting truth from the prediction results of these existing truth discovery methods. In particular, we first distinguish between the single-truth and multi-truth discovery problems and formally define the ensemble truth discovery problem. Then, we analyze the feasibility of the ensemble approach, and derive two models, i.e., serial model and parallel model, to implement the approach, and to further tackle the above two types of truth discovery problems. Extensive experiments over three large real-world datasets and various synthetic datasets demonstrate the effectiveness of our approach.
机译:真相发现是大数据时代的一个热门研究主题,其目的是从同一数据项上多个来源提供的冲突数据中识别真实值。以前,已经提出了许多方法来解决此问题。但是,由于不同方法的不同特性,现有方法中没有一个是明显胜过其他方法的明显赢家。另外,在某些情况下,由于有限的地面实况或所应用数据集的不同特征所带来的偏差,一种改进的方法甚至可能无法击败其原始版本。为了实现一种可实现更好且更强大的整体性能的方法,我们建议通过从这些现有真相发现方法的预测结果中提取真相来充分利用现有方法的优势。特别是,我们首先区分单真相发现问题和多真相发现问题,并正式定义整体真相发现问题。然后,我们分析了集成方法的可行性,并导出了两个模型,即串行模型和并行模型,以实现该方法,并进一步解决上述两种类型的真相发现问题。在三个大型现实世界数据集和各种综合数据集上进行的广泛实验证明了我们方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号