Outlier Detection on Mixed-Type Data: An Energy-Based Approach

机译：混合类型数据的异常值检测：基于能量的方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Outlier detection amounts to finding data points that differ significantly from the norm. Classic outlier detection methods are largely designed for single data type such as continuous or discrete. However, real world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Handling mixed-type data in a disciplined way remains a great challenge. In this paper, we propose a new unsupervised outlier detection method for mixed-type data based on Mixed-variate Restricted Boltzmann Machine (Mv.RBM). The Mv.RBM is a principled probabilistic method that models data density. We propose to use free-energy derived from Mv.RBM as outlier score to detect outliers as those data points lying in low density regions. The method is fast to learn and compute, is scalable to massive datasets. At the same time, the outlier score is identical to data negative log-density up-to an additive constant. We evaluate the proposed method on synthetic and real-world datasets and demonstrate that (a) a proper handling mixed-types is necessary in outlier detection, and (b) free-energy of Mv.RBM is a powerful and efficient outlier scoring method, which is highly competitive against state-of-the-arts.

机译：离群值检测等于找到与规范有显着差异的数据点。经典的离群值检测方法主要针对单个数据类型（例如连续或离散）而设计。但是，现实世界中的数据越来越异类化，其中一个数据点可以同时具有离散和连续的属性。以规范的方式处理混合类型的数据仍然是一个巨大的挑战。本文提出了一种新的基于混合变量受限玻尔兹曼机（Mv.RBM）的混合类型数据无监督离群检测方法。 Mv.RBM是一种建模数据密度的原则性概率方法。我们建议使用源自Mv.RBM的自由能作为离群值来检测离群值，因为这些数据点位于低密度区域。该方法快速学习和计算，可扩展到海量数据集。同时，离群值与数据负对数密度（直到加法常数）相同。我们在合成数据集和现实世界数据集上评估了该方法，并证明（a）在异常值检测中必须正确处理混合类型，并且（b）Mv.RBM的自由能是一种功能强大且高效的异常值评分方法，与最先进的技术相比具有很高的竞争力。

著录项

来源
《International conference on advanced data mining and applications》|2016年|111-125|共15页
会议地点
作者
Kien Do; Truyen Tran; Dinh Phung; Svetha Venkatesh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. MiFI-Outlier: Minimal infrequent itemset-based outlier detection approach on uncertain data stream [J] . Knowledge-Based Systems . 2020,第Mara5期

机译：MiFI-Outlier：针对不确定数据流的基于偶项集的极少频率异常检测方法
2. Thresholds based outlier detection approach for mining class outliers:An empirical case study on software measurement datasets [J] . Oral Alan, Cagatay Catal Expert Systems with Application . 2011,第4期

机译：基于阈值的挖掘类离群值检测方法：基于软件测量数据集的经验案例研究
3. A functional data approach to missing value imputation and outlier detection for traffic flow data [J] . Jeng-Min Chiou, Yi-Chen Zhang, Wan-Hui Chen, Transportmetrica . 2014,第2期

机译：一种功能数据方法，用于交通流数据的缺失值估算和异常值检测
4. Outlier Detection on Mixed-Type Data: An Energy-Based Approach [C] . Kien Do, Truyen Tran, Dinh Phung, International Conference on Advanced Data Mining and Applications . 2016

机译：混合型数据的异常检测：基于能量的方法
5. Scalable and efficient outlier detection in large distributed data sets with mixed-type attributes. [D] . Koufakou, Anna. 2009

机译：具有混合类型属性的大型分布式数据集中的可扩展且高效的离群值检测。
6. Designing a Streaming Algorithm for Outlier Detection in Data Mining—An Incremental Approach [O] . Kangqing Yu, Wei Shi, Nicola Santoro 2020

机译：设计用于数据挖掘中异常值检测的流算法—一种增量方法
7. Outlier Detection on Mixed-Type Data: An Energy-based Approach [O] . Do, Kien, Tran, Truyen, Phung, Dinh, 2016

机译：混合型数据的异常检测：基于能量的方法

Outlier Detection on Mixed-Type Data: An Energy-Based Approach

摘要

著录项

相似文献

相关主题

期刊订阅