首页> 外文期刊>Data mining and knowledge discovery >A tutorial on statistically sound pattern discovery
【24h】

A tutorial on statistically sound pattern discovery

机译:统计声音模式发现的教程

获取原文
           

摘要

Statistically sound pattern discovery harnesses the rigour of statistical hypothesis testing to overcome many of the issues that have hampered standard data mining approaches to pattern discovery. Most importantly, application of appropriate statistical tests allows precise control over the risk of false discoveriespatterns that are found in the sample data but do not hold in the wider population from which the sample was drawn. Statistical tests can also be applied to filter out patterns that are unlikely to be useful, removing uninformative variations of the key patterns in the data. This tutorial introduces the key statistical and data mining theory and techniques that underpin this fast developing field. We concentrate on two general classes of patterns: dependency rules that express statistical dependencies between condition and consequent parts and dependency sets that express mutual dependence between set elements. We clarify alternative interpretations of statistical dependence and introduce appropriate tests for evaluating statistical significance of patterns in different situations. We also introduce special techniques for controlling the likelihood of spurious discoveries when multitudes of patterns are evaluated. The paper is aimed at a wide variety of audiences. It provides the necessary statistical background and summary of the state-of-the-art for any data mining researcher or practitioner wishing to enter or understand statistically sound pattern discovery research or practice. It can serve as a general introduction to the field of statistically sound pattern discovery for any reader with a general background in data sciences.
机译:统计声音模式发现利用统计假设检测的严谨性,以克服许多妨碍标准数据挖掘方法的许多问题来模式发现。最重要的是,适当的统计测试的应用允许精确控制样品数据中发现的虚假探索的风险,但不在绘制样本的更广泛的人群中。统计测试也可以应用于滤除不太可能是有用的模式,从而消除数据中的关键模式的无色变化。本教程介绍了基于此快速发展领域的关键统计和数据挖掘理论和技术。我们专注于两种一般的模式类:依赖规则,即表达条件和随后的零件和依赖关系集之间的统计依赖性,该级别和依赖性集合,其表达了集合元素之间的相互依赖性。我们澄清统计依赖的替代解释,并引入适当的测试,以评估不同情况下模式的统计学意义。我们还介绍了在评估多种模式时控制杂散发现的可能性的特殊技术。本文的目标是各种各样的受众。它为希望进入或了解统计声音模式发现研究或实践的任何数据挖掘研究人员或从业者提供了必要的统计背景和摘要。它可以作为具有数据科学一般背景的任何读者的统计上声音模式发现领域的一般性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号