...
首页> 外文期刊>Knowledge and Information Systems >Capabilities of outlier detection schemes in large datasets, framework and methodologies
【24h】

Capabilities of outlier detection schemes in large datasets, framework and methodologies

机译:大型数据集,框架和方法中异常值检测方案的功能

获取原文
获取原文并翻译 | 示例
           

摘要

Outlier detection is concerned with discovering exceptional behaviors of objects. Its theoretical principle and practical implementation lay a foundation for some important applications such as credit card fraud detection, discovering criminal behaviors in e-commerce, discovering computer intrusion, etc. In this paper, we first present a unified model for several existing outlier detection schemes, and propose a compatibility theory, which establishes a framework for describing the capabilities for various outlier formulation schemes in terms of matching users'intuitions. Under this framework, we show that the density-based scheme is more powerful than the distance-based scheme when a dataset contains patterns with diverse characteristics. The density-based scheme, however, is less effective when the patterns are of comparable densities with the outliers. We then introduce a connectivity-based scheme that improves the effectiveness of the density-based scheme when a pattern itself is of similar density as an outlier. We compare density-based and connectivity-based schemes in terms of their strengths and weaknesses, and demonstrate applications with different features where each of them is more effective than the other. Finally, connectivity-based and density-based schemes are comparatively evaluated on both real-life and synthetic datasets in terms of recall, precision, rank power and implementation-free metrics.
机译:离群值检测与发现对象的异常行为有关。它的理论原理和实际实现为信用卡欺诈检测,发现电子商务中的犯罪行为,发现计算机入侵等重要应用奠定了基础。在本文中,我们首先为几种现有的异常检测方案提供一个统一模型。 ,并提出了一种相容性理论,该理论建立了一个框架,用于根据匹配用户的意愿描述各种离群表述方案的功能。在此框架下,我们表明当数据集包含具有不同特征的模式时,基于密度的方案比基于距离的方案更强大。但是,当图案的密度与异常值相当时,基于密度的方案效果较差。然后,我们介绍一种基于连接性的方案,当图案本身的密度与异常值相似时,可以提高基于密度的方案的有效性。我们比较了基于密度的方案和基于连接性的方案的优缺点,并演示了具有不同功能的应用程序,其中每种方法都比其他方法更有效。最后,在召回率,精度,排名能力和无实现指标方面,对真实数据集和综合数据集上的基于连接性和基于密度的方案进行了比较评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号