首页> 外文会议>International conference on digital economy >A New Spark Based K-Means Clustering with Data Removing Strategy
【24h】

A New Spark Based K-Means Clustering with Data Removing Strategy

机译:具有数据去除策略的新的火花基K-mears聚类

获取原文

摘要

Clustering is an important technique in machine learning, which has been used to organize data into groups of similar data points called also clusters. In fact, conventional clustering methods are not suitable when dealing with large scale data. This is explained by the high computational cost of these methods which require unrealistic time to build the grouping. We propose in this work a new Spark based K-means Clustering with Data Removing Strategy referred to as (SKMDRS). The proposed method is based on data removing strategy which aims to reduce the computational time, by removing at each iteration data points that are unlikely to change the clusters to which they belong thereafter. In addition, the clustering process is distributed through Spark framework in order to enhance the scalability. Conducted experiments show the efficiency of the proposed method compared to existing ones.
机译:群集是机器学习中的重要技术,它已被用于将数据组织成称为群集的类似数据点组。实际上,在处理大规模数据时,传统的聚类方法不适合。这是通过这些方法的高计算成本来解释,这些方法需要不切实际的时间来构建分组。我们在这项工作中提出了一种新的火花基的K-Means群集,数据删除策略称为(SKMDR)。该方法基于数​​据去除策略,该策略旨在通过在每个迭代数据点处移除不太可能改变它们所属的簇的每个迭代数据点来减少计算时间。此外,聚类过程通过Spark框架分发,以提高可扩展性。进行的实验表明,与现有的实验表明该方法的效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号