首页> 外文会议>International conference on distributed computing and internet technologies >Performance Analysis of Parallel K-Means with Optimization Algorithms for Clustering on Spark
【24h】

Performance Analysis of Parallel K-Means with Optimization Algorithms for Clustering on Spark

机译:基于火花聚类的优化算法的并行K均值性能分析

获取原文

摘要

Clustering divides data into meaningful, useful groups known as clusters without any prior knowledge about the data. One of the drawbacks of K-Means clustering is the estimation of initial centroids which influence the performance of the algorithm. To overcome this issue, optimization algorithms like Bat and Firefly are executed as pre-processing step. These algorithms return optimal centroids which is given as input to the K-Means algorithm. Clustering is carried out on large data sets, therefore Apache Spark, an open source software framework is used. The performance of the optimization algorithms is evaluated and the best algorithm is determined.
机译:群集将数据分为有意义的,有用的组,称为群集,而无需任何有关数据的先验知识。 K-Means聚类的缺点之一是估计影响算法性能的初始质心。为了克服此问题,将Bat和Firefly之类的优化算法作为预处理步骤执行。这些算法返回最优质心,该质心作为K-Means算法的输入给出。集群是在大型数据集上进行的,因此使用了Apache Spark(一种开源软件框架)。评估优化算法的性能并确定最佳算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号