首页> 外文会议>Twenty-ninth International Conference on Very Large Databases; Sep 9-12, 2003; Berlin, Germany >BibFinder/StatMiner: Effectively Mining and Using Coverage and Overlap Statistics in Data Integration
【24h】

BibFinder/StatMiner: Effectively Mining and Using Coverage and Overlap Statistics in Data Integration

机译:BibFinder / StatMiner:在数据集成中有效地挖掘和使用覆盖率和重叠统计信息

获取原文
获取原文并翻译 | 示例

摘要

Recent work in data integration has shown the importance of statistical information about the coverage and overlap of sources for efficient query processing. Despite this recognition there are no effective approaches for learning the needed statistics. In this paper we present StatMiner, a system for estimating the coverage and overlap statistics while keeping the needed statistics tightly under control. StatMiner uses a hierarchical classification of the queries, and threshold based variants of familiar data mining techniques to dynamically decide the level of resolution at which to learn the statistics. We will demonstrate the major functionalities of StatMiner and the effectiveness of the learned statistics in BibFinder, a publicly available computer science bibliography mediator we developed. The sources that BibFinder integrates are autonomous and can have uncontrolled coverage and overlap. An important focus in BibFinder was thus to mine coverage and overlap statistics about these sources and to exploit them to improve query processing.
机译:数据集成方面的最新工作表明,对于有效查询处理,有关源的覆盖范围和重叠的统计信息非常重要。尽管有这种认识,但没有有效的方法来学习所需的统计信息。在本文中,我们介绍了StatMiner,这是一个用于估计覆盖率和重叠统计量,同时将所需统计量保持在严格控制之下的系统。 StatMiner使用查询的分层分类以及熟悉的数据挖掘技术的基于阈值的变体来动态决定学习统计信息的分辨率级别。我们将在BibFinder(我们开发的可公开获得的计算机科学书目介体)中演示StatMiner的主要功能以及学习到的统计数据的有效性。 BibFinder集成的源是自主的,并且覆盖范围和重叠可能不受控制。因此,BibFinder的一个重要重点是挖掘有关这些源的覆盖率和重叠统计信息,并利用它们来改善查询处理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号