基于网格和密度的海量数据增量式离群点挖掘算法

张净; 孙志挥; 杨明; 倪巍伟; 杨宜东

首页> 中文期刊> 《计算机研究与发展》 >基于网格和密度的海量数据增量式离群点挖掘算法

基于网格和密度的海量数据增量式离群点挖掘算法

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

处理海量和高维数据已经成为设计离群点算法面临的重要任务和挑战,针对海量数据的特点提出一种基于网格和密度的增量式离群点挖掘算法IGDLOF,算法的基本思想为:采用网格的七元组信息减少数据维数和数量,利用增量更新减少内存需求.通过代表点过滤相应的主体数据,先判断再进行近似密度计算的方法减少计算量,降低算法的复杂度.通过在真实和仿真数据集的测试表明,IGDLOF增量算法可与LOF算法保持相同的精确度,而执行效率得到显著的提高.%Outlier mining is an important branch in the area of data mining. It has been widely applied to many fields such as industrial and financial applications for IDS and detecting credit card fraud.Dealing with massive and high dimensional data has become tasks and challenges for outlier algorithm to be faced. Based on the definitions of density and grid, a fast incremental outlier mining algorithm is proposed. It introduces seven-tuple information grid to reduce the number and dimension of data, and use incremental updates to reduce memory requirements. Dense grid, sparseness grid and neighbor grid are defined, which could make computation deal with grid conveniently. Through the appropriate representative point filtering the main data, an approximate method to reduce computation and decrease the complexity of the algorithm is adopted. The experiments are performed on different initial datasets and incremental datasets. And the results demonstrate the detection rate, false rate alarm rate, precisions and average running time. The real and simulated data sets of tests show that the proposed algorithm can maintain the same accuracy with LOF algorithm, but the implementation efficiency is improved significantly.

著录项

来源
《计算机研究与发展》 |2011年第5期|823-830|共8页
作者
张净; 孙志挥; 杨明; 倪巍伟; 杨宜东;
展开▼
作者单位

东南大学计算机科学与工程系;

南京;

210096;

江苏大学电气信息工程学院;

江苏镇江;

212001;

东南大学计算机科学与工程系;

南京;

210096;

南京师范大学计算机科学与技术学院;

南京;

210097;

东南大学计算机科学与工程系;

南京;

210096;

东南大学计算机科学与工程系;

南京;

210096;

展开▼
原文格式 PDF
正文语种 chi
中图分类程序设计、软件工程;
关键词
海量数据; 网格; 密度; 离群点挖掘; 增量; LOF算法;

相似文献

中文文献
外文文献
专利

1. 基于网格技术的高维大数据集离群点挖掘算法 [J] . 郭龙 . 通讯世界 . 2016,第021期
2. 基于数据分区和网格的离群点挖掘算法 [J] . 唐成龙 ,邢长征 . 计算机应用 . 2012,第008期
3. 基于网格技术的高维大数据集离群点挖掘算法 [J] . 曹洪其 ,孙志挥 . 计算机应用 . 2007,第010期
4. 基于网格聚类技术的离群点挖掘算法 [J] . 曹洪其 ,余岚 ,孙志挥 . 计算机工程 . 2006,第011期
5. 基于邻域密度的异构数据局部离群点挖掘算法 [J] . 王晓辉 ,宋学坤 ,王晓川 . 计算机仿真 . 2021,第007期
6. 基于离群点剔除的网络热点事件挖掘算法 [C] . WANG Gen-cheng ,王跟成 ,LI Jun . 中国计算机用户协会仿真应用分会成立三十周年庆祝大会暨2013全国仿真技术学术会议 . 2013
7. 高维海量数据集离群点挖掘算法研究及其应用 [A] . 张净 . 2010

基于网格和密度的海量数据增量式离群点挖掘算法

摘要

著录项

相似文献

相关主题

期刊订阅