首页> 外文会议>International conference on advanced data mining and applications >Outlier Detection on Mixed-Type Data: An Energy-Based Approach
【24h】

Outlier Detection on Mixed-Type Data: An Energy-Based Approach

机译:混合类型数据的异常值检测:基于能量的方法

获取原文

摘要

Outlier detection amounts to finding data points that differ significantly from the norm. Classic outlier detection methods are largely designed for single data type such as continuous or discrete. However, real world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Handling mixed-type data in a disciplined way remains a great challenge. In this paper, we propose a new unsupervised outlier detection method for mixed-type data based on Mixed-variate Restricted Boltzmann Machine (Mv.RBM). The Mv.RBM is a principled probabilistic method that models data density. We propose to use free-energy derived from Mv.RBM as outlier score to detect outliers as those data points lying in low density regions. The method is fast to learn and compute, is scalable to massive datasets. At the same time, the outlier score is identical to data negative log-density up-to an additive constant. We evaluate the proposed method on synthetic and real-world datasets and demonstrate that (a) a proper handling mixed-types is necessary in outlier detection, and (b) free-energy of Mv.RBM is a powerful and efficient outlier scoring method, which is highly competitive against state-of-the-arts.
机译:离群值检测等于找到与规范有显着差异的数据点。经典的离群值检测方法主要针对单个数据类型(例如连续或离散)而设计。但是,现实世界中的数据越来越异类化,其中一个数据点可以同时具有离散和连续的属性。以规范的方式处理混合类型的数据仍然是一个巨大的挑战。本文提出了一种新的基于混合变量受限玻尔兹曼机(Mv.RBM)的混合类型数据无监督离群检测方法。 Mv.RBM是一种建模数据密度的原则性概率方法。我们建议使用源自Mv.RBM的自由能作为离群值来检测离群值,因为这些数据点位于低密度区域。该方法快速学习和计算,可扩展到海量数据集。同时,离群值与数据负对数密度(直到加法常数)相同。我们在合成数据集和现实世界数据集上评估了该方法,并证明(a)在异常值检测中必须正确处理混合类型,并且(b)Mv.RBM的自由能是一种功能强大且高效的异常值评分方法,与最先进的技术相比具有很高的竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号