BibFinder/StatMiner: Effectively Mining and Using Coverage and Overlap Statistics in Data Integration

机译：BibFinder / StatMiner：在数据集成中有效地挖掘和使用覆盖率和重叠统计信息

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent work in data integration has shown the importance of statistical information about the coverage and overlap of sources for efficient query processing. Despite this recognition there are no effective approaches for learning the needed statistics. In this paper we present StatMiner, a system for estimating the coverage and overlap statistics while keeping the needed statistics tightly under control. StatMiner uses a hierarchical classification of the queries, and threshold based variants of familiar data mining techniques to dynamically decide the level of resolution at which to learn the statistics. We will demonstrate the major functionalities of StatMiner and the effectiveness of the learned statistics in BibFinder, a publicly available computer science bibliography mediator we developed. The sources that BibFinder integrates are autonomous and can have uncontrolled coverage and overlap. An important focus in BibFinder was thus to mine coverage and overlap statistics about these sources and to exploit them to improve query processing.

机译：数据集成方面的最新工作表明，对于有效查询处理，有关源的覆盖范围和重叠的统计信息非常重要。尽管有这种认识，但没有有效的方法来学习所需的统计信息。在本文中，我们介绍了StatMiner，这是一个用于估计覆盖率和重叠统计量，同时将所需统计量保持在严格控制之下的系统。 StatMiner使用查询的分层分类以及熟悉的数据挖掘技术的基于阈值的变体来动态决定学习统计信息的分辨率级别。我们将在BibFinder（我们开发的可公开获得的计算机科学书目介体）中演示StatMiner的主要功能以及学习到的统计数据的有效性。 BibFinder集成的源是自主的，并且覆盖范围和重叠可能不受控制。因此，BibFinder的一个重要重点是挖掘有关这些源的覆盖率和重叠统计信息，并利用它们来改善查询处理。

著录项

来源
《Twenty-ninth International Conference on Very Large Databases; Sep 9-12, 2003; Berlin, Germany》|2003年|p.1097-1100|共4页
会议地点 Berlin(DE);Berlin(DE)
作者
Zaiqing Nie; Subbarao Kambhampati; Thomas Hernandez;
展开▼
作者单位

Department of Computer Science and Engineering Arizona State University,Tempe, AZ 85287-5406;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Effectively mining and using coverage and overlap statistics for data integration [J] . Zaiqing Nie, Kambhampati S., Nambiar U. IEEE Transactions on Knowledge and Data Engineering . 2005,第5期

机译：有效挖掘和使用覆盖率和重叠统计信息进行数据集成
2. Integrating Genomics and Clinical Data for Statistical Analysis by Using GEnome MINIng (GEMINI) and Fast Healthcare Interoperability Resources (FHIR): System Design and Implementation [J] . Julian Gruendner, Nicolas Wolf, Lars T?gel, Journal of medical Internet research . 2020,第10期

机译：通过使用基因组挖掘（Gemini）和快速医疗互操作性资源（FHIR）整合基因组学和临床数据进行统计分析：系统设计与实现
3. Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis. [J] . Dinu V, Zhao H, Miller PL Journal of biomedical informatics. . 2007,第6期

机译：将领域知识与统计和数据挖掘方法相结合，以进行高密度基因组SNP疾病关联分析。
4. BibFinder/StatMiner: Effectively Mining and Using Coverage and Overlap Statistics in Data Integration [C] . Zaiqing Nie, Subbarao Kambhampati, Thomas Hernandez International conference on very large databases . 2003

机译：Bibfinder / Statminer：有效挖掘和使用数据集成中的覆盖范围和重叠统计数据
5. Mining and using coverage and overlap statistics for data integration. [D] . Nie, Zaiqing. 2004

机译：挖掘和使用覆盖率和重叠统计信息进行数据集成。
6. Integrating text mining data mining and network analysis for identifying genetic breast cancer trends [O] . Gabriela Jurca, Omar Addam, Alper Aksac, 2016

机译：集成文本挖掘数据挖掘和网络分析以识别遗传性乳腺癌趋势
7. A Frequency-based Approach for Mining Coverage Statistics in Data Integration [O] . Zaiqing Nie, Subbarao Kambhampati 2004

机译：数据集成中基于频率的挖掘覆盖率统计方法
8. Evaluation of the National Library of Medicine's Programs in the Medical Behavior Sciences. Coverage, Overlaps, and Gaps in Bibliographic Databases Dealing with the Medical Behavioral Sciences (MBS) Literature. Study 1 [R] . Steere, D. T. , Griffith, B. C. , Cowan, J. A. , 1983

机译：国家医学图书馆医学行为科学项目评估。书目数据库中的覆盖范围，重叠和差距处理医学行为科学（mBs）文献。研究1

BibFinder/StatMiner: Effectively Mining and Using Coverage and Overlap Statistics in Data Integration

摘要

著录项

相似文献

相关主题

期刊订阅