...
首页> 外文期刊>ACM transactions on autonomous and adaptive systems >A Support System for Clustering Data Streams with a Variable Number of Clusters
【24h】

A Support System for Clustering Data Streams with a Variable Number of Clusters

机译:集群数量可变的集群数据流集群支持系统

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Many algorithms for clustering data streams that are based on the widely used k-Means have been proposed in the literature. Most of these algorithms assume that the number of clusters, k, is known and fixed a priori by the user. Aimed at relaxing this assumption, which is often unrealistic in practical applications, we propose a support system that allows not only estimating the number of clusters automatically from data but also monitoring the process of the data-stream clustering. We illustrate the potential of the proposed system by means of a prototype that implements eight algorithms for clustering data streams, namely, Stream LSearch-OMRk, StreamLSearch-BkM, Stream LSearch-IOMRk, Stream LSearch-IBkM, CluStream-OMRk, CluStream-BkM, StreamKM++-OMRk, and StreamKM++-BkM. These algorithms are combinations of three state-of-the-art algorithms for clustering data streams with fixed k, namely, Stream LSearch, CluStream, and StreamKM++, with two algorithms for estimating the number of clusters, which are Ordered Multiple Runs of k-Means (OMRk) and Bisecting k-Means (BkM). We experimentally compare the performance of these algorithms using both synthetic and real-world data streams. Analyses of statistical significance suggest that the algorithms that are based on OMRk yield the best data partitions, while the algorithms that are based on BkM are more computationally efficient. Additionally, StreamKM++-OMRk and Stream LSearch-IBkM provide the best tradeoff relationship between accuracy and efficiency.
机译:文献中已经提出了许多基于广泛使用的k均值的数据流聚类算法。这些算法中的大多数都假定聚类数k是已知的,并且由用户预先确定。为了放松这种在实际应用中通常不切实际的假设,我们提出了一种支持系统,该系统不仅允许从数据自动估计集群的数量,而且可以监视数据流集群的过程。我们通过一个原型实现了该系统的潜力,该原型实现了八种算法来对数据流进行聚类,即流LSearch-OMRk,流LSearch-BkM,流LSearch-IOMRk,流LSearch-IBkM,CluStream-OMRk,CluStream-BkM ,StreamKM ++-OMRk和StreamKM ++-BkM。这些算法结合了三种最先进的算法(用于对具有固定k的数据流进行聚类),即Stream LSearch,CluStream和StreamKM ++,以及两种用于估计簇数的算法,它们是k-均值(OMRk)和二等分k均值(BkM)。我们使用合成数据流和实际数据流,通过实验比较了这些算法的性能。统计显着性分析表明,基于OMRk的算法可产生最佳的数据分区,而基于BkM的算法在计算效率上更高。此外,StreamKM ++-OMRk和Stream LSearch-IBkM提供了准确性和效率之间的最佳折衷关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号