A Support System for Clustering Data Streams with a Variable Number of Clusters

Silva Jonathan de Andrade; Hruschka Eduardo Raul

首页> 外文期刊>ACM transactions on autonomous and adaptive systems >A Support System for Clustering Data Streams with a Variable Number of Clusters

【24h】

A Support System for Clustering Data Streams with a Variable Number of Clusters

机译：集群数量可变的集群数据流集群支持系统

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many algorithms for clustering data streams that are based on the widely used k-Means have been proposed in the literature. Most of these algorithms assume that the number of clusters, k, is known and fixed a priori by the user. Aimed at relaxing this assumption, which is often unrealistic in practical applications, we propose a support system that allows not only estimating the number of clusters automatically from data but also monitoring the process of the data-stream clustering. We illustrate the potential of the proposed system by means of a prototype that implements eight algorithms for clustering data streams, namely, Stream LSearch-OMRk, StreamLSearch-BkM, Stream LSearch-IOMRk, Stream LSearch-IBkM, CluStream-OMRk, CluStream-BkM, StreamKM++-OMRk, and StreamKM++-BkM. These algorithms are combinations of three state-of-the-art algorithms for clustering data streams with fixed k, namely, Stream LSearch, CluStream, and StreamKM++, with two algorithms for estimating the number of clusters, which are Ordered Multiple Runs of k-Means (OMRk) and Bisecting k-Means (BkM). We experimentally compare the performance of these algorithms using both synthetic and real-world data streams. Analyses of statistical significance suggest that the algorithms that are based on OMRk yield the best data partitions, while the algorithms that are based on BkM are more computationally efficient. Additionally, StreamKM++-OMRk and Stream LSearch-IBkM provide the best tradeoff relationship between accuracy and efficiency.

机译：文献中已经提出了许多基于广泛使用的k均值的数据流聚类算法。这些算法中的大多数都假定聚类数k是已知的，并且由用户预先确定。为了放松这种在实际应用中通常不切实际的假设，我们提出了一种支持系统，该系统不仅允许从数据自动估计集群的数量，而且可以监视数据流集群的过程。我们通过一个原型实现了该系统的潜力，该原型实现了八种算法来对数据流进行聚类，即流LSearch-OMRk，流LSearch-BkM，流LSearch-IOMRk，流LSearch-IBkM，CluStream-OMRk，CluStream-BkM ，StreamKM ++-OMRk和StreamKM ++-BkM。这些算法结合了三种最先进的算法（用于对具有固定k的数据流进行聚类），即Stream LSearch，CluStream和StreamKM ++，以及两种用于估计簇数的算法，它们是k-均值（OMRk）和二等分k均值（BkM）。我们使用合成数据流和实际数据流，通过实验比较了这些算法的性能。统计显着性分析表明，基于OMRk的算法可产生最佳的数据分区，而基于BkM的算法在计算效率上更高。此外，StreamKM ++-OMRk和Stream LSearch-IBkM提供了准确性和效率之间的最佳折衷关系。

著录项

来源
《ACM transactions on autonomous and adaptive systems》 |2016年第2期|11.1-11.26|共26页
作者
Silva Jonathan de Andrade; Hruschka Eduardo Raul;
展开▼
作者单位

Univ Sao Paulo, Sao Carlos, SP, Brazil|Univ Mato Grosso Sul UFMS, Ponta Pora, MS, Brazil;

Univ Sao Paulo, Dept Comp Sci, Sao Carlos, SP, Brazil;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Design; Algorithms; Experimentation; Clustering; data stream; online clustering;

机译：设计;算法;实验;聚类;数据流;在线聚类;

相似文献

外文文献
中文文献
专利

1. Ant Colony Stream Clustering: A Fast Density Clustering Algorithm for Dynamic Data Streams [J] . Fahy Conor, Yang Shengxiang, Gongora Mario Cybernetics, IEEE Transactions on . 2019,第6期

机译：蚁群流聚类：动态数据流的快速密度聚类算法
2. Ant Colony Stream Clustering: A Fast Density Clustering Algorithm for Dynamic Data Streams [J] . Fahy Conor, Yang Shengxiang, Gongora Mario Cybernetics, IEEE Transactions on . 2019,第6期

机译：蚁群流群集：动态数据流的快速密度聚类算法
3. Clustering High-Dimensional Data Stream: A Survey on Subspace Clustering, Projected Clustering on Bioinformatics Applications (Advanced Science, Engineering and Medicine, Vol. 8(9), pp. 749–757 (2016)) [J] . Baghernia Ali, Pavin Hamid, Mirnabibaboli Miresmail, Advanced Science, Engineering and Medicine . 2017,第7期

机译：聚类高维数据流：生物信息学应用中预计集群的子空间聚类调查（高级科学，工程和医学，Vol.8（9），PP。749-757（2016））
4. Supporting Data Center Management through Clustering of System Data Streams [C] . Stefania Tosi, Sara Casolari, Michele Colajanni ICAIT 2012 . 2013

机译：通过群集系统数据流支持数据中心管理
5. Clustering transient data streams by example and by variable. [D] . Chaovalit, Pimwadee. 2009

机译：通过示例和变量对瞬时数据流进行聚类。
6. The Comparison of Iranian Normative Reference Data with Five Countries ‎Across Variables in Eight Rorschach Comprehensive System (CS) Clusters [O] . Abufazel Hosseininasab, Mohammadreza Mohammadi, Samira Jouzi, 2016

机译：五个国家的伊朗标准参考数据的比较八个罗夏综合系统（CS）集群中的跨变量
7. An arbitrary shape clustering algorithm over variable density data streams [O] . Na Su, Jimin Liu, Changqing Yan, 2017

机译：可变密度数据流上的任意形状聚类算法

A Support System for Clustering Data Streams with a Variable Number of Clusters

摘要

著录项

相似文献

相关主题

期刊订阅