...
首页> 外文期刊>Canadian journal of electrical and computer engineering >A Novel Clustering Framework for Stream Data Un nouveau cadre de classifications pour les données de flux
【24h】

A Novel Clustering Framework for Stream Data Un nouveau cadre de classifications pour les données de flux

机译:流数据的新型聚类框架流数据的新分类框架

获取原文
获取原文并翻译 | 示例
           

摘要

There is a growing tendency for developing real-time clustering of continuous stream data. In this regard, a few attempts have been made to improve the off-line phase of stream clustering methods, whereas these methods almost use a simple distance function in their online phase. In practice, clusters have complex shapes, and therefore, measuring the distance of incoming samples to the mean of asymmetric microclusters might mislead incoming samples to irrelevant microclusters. In this paper, a novel framework is proposed, which can enhance the online phase of all stream clustering methods. In this manner, for each microcluster for which its population exceeds a threshold, a classifier is exclusively trained to capture its boundary and statistical properties. Thus, incoming samples are assigned to the microclusters according to the classifiers (a) over capL (TM) scores. Here, the incremental Na (A) over tilde (-)ve Bayes classifier is chosen, due to its fast learning property. DenStream and CluStream as the state-of-the-art methods were chosen and their performance was assessed over nine synthetic and real data sets, with and without applying the proposed framework. The comparative results in terms of purity, general recall, general precision, concept change traceability, computational complexity, and robustness against noise over the data sets imply the superiority of the modified methods to their original versions.
机译:发展连续流数据的实时聚类的趋势正在增长。在这方面,已经进行了一些尝试来改进流聚类方法的离线阶段,而这些方法几乎在其在线阶段中使用了简单的距离函数。实际上,簇具有复杂的形状,因此,测量传入样本到不对称微团的均值的距离可能会误导传入样本到不相关的微团。本文提出了一种新颖的框架,可以增强所有流聚类方法的在线阶段。以这种方式,对于其种群超过阈值的每个微簇,专门训练分类器以捕获其边界和统计属性。因此,根据capL(TM)分数中的分类器(a),将传入样本分配给微簇。在此,由于其快速学习特性,因此选择了波浪号(-)ve贝叶斯分类器上的增量Na(A)。选择了DenStream和CluStream作为最先进的方法,并在使用和不使用建议的框架的情况下,对9个综合和真实数据集进行了性能评估。在纯度,一般召回率,一般精度,概念变更可追溯性,计算复杂性以及数据上的抗噪声能力方面的比较结果表明,修改后的方法优于原始方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号