首页> 外文期刊>Concurrency and Computation >A parallel k-means clustering algorithm based on redundance elimination and extreme points optimization employing MapReduce
【24h】

A parallel k-means clustering algorithm based on redundance elimination and extreme points optimization employing MapReduce

机译:基于冗余消除和极点优化的并行k均值聚类算法(MapReduce)

获取原文
获取原文并翻译 | 示例
           

摘要

When facing massive statistical data, the k-means algorithm is very difficult to satisfy the need of data processing as it lacks an effective parallel mechanism. This paper proposes an improved k-means algorithm (IMR-KCA) to conduct clustering analysis based on medical data employing MapReduce computing framework. Through analyzing the defects of vast redundancy in the traditional k-means algorithms, a selection model is firstly proposed to simplify the computations with multiple clustering centers. Based on several proposed theorems, we prove the correctness of this selection model. Second, this paper provides a method to calculate the distances from extreme points to central points, and the original Euclidean distance is replaced with Manhattan distance. For this simplification, a group of theorems are proposed to prove the correctness. Next, we provide a group of implementation algorithms to complete the parallelism of the clustering computation employing the MapReduce framework. Finally, the experimental results illustrate that IMR-KCA is more reliable and efficient than the direct parallelization of the traditional clustering algorithms based on MapReduce.
机译:当面对海量统计数据时,k-means算法由于缺乏有效的并行机制而很难满足数据处理的需求。提出了一种改进的k-means算法(IMR-KCA),利用MapReduce计算框架对医学数据进行聚类分析。通过分析传统k均值算法中大量冗余的缺陷,提出了一种选择模型来简化具有多个聚类中心的计算。基于提出的几个定理,我们证明了该选择模型的正确性。其次,本文提供了一种计算极端点到中心点距离的方法,并将原来的欧几里得距离替换为曼哈顿距离。为简化起见,提出了一组定理以证明其正确性。接下来,我们提供一组实现算法,以使用MapReduce框架完成聚类计算的并行性。最后,实验结果表明,IMR-KCA比基于MapReduce的传统聚类算法的直接并行化更为可靠和高效。

著录项

  • 来源
    《Concurrency and Computation》 |2017年第20期|e4109.1-e4109.18|共18页
  • 作者单位

    College of Information Science and Engineering, Hunan University, Hunan 410082, China;

    College of Information Science and Engineering, Hunan University, Hunan 410082, China;

    College of Information Science and Engineering, Hunan University, Hunan 410082, China;

    Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha 410004, P. R. China;

    College of Information Science and Engineering, Hunan University, Hunan 410082, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    clustering algorithms; extreme point; k-means; MapReduce; redundant distance;

    机译:聚类算法;极端点k均值MapReduce;冗余距离;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号