...
首页> 外文期刊>Journal of grid computing >Towards Efficient Ensemble Hierarchical Clustering with MapReduce-based Clusters Clustering Technique and the Innovative Similarity Criterion
【24h】

Towards Efficient Ensemble Hierarchical Clustering with MapReduce-based Clusters Clustering Technique and the Innovative Similarity Criterion

机译:Towards Efficient Ensemble Hierarchical Clustering with MapReduce-based Clusters Clustering Technique and the Innovative Similarity Criterion

获取原文
获取原文并翻译 | 示例
           

摘要

Today, data plays an important and fundamental role in our daily lives. The increasing growth of data production has led to the big data revolution. Managing and analyzing this data, which is often unlabeled, is a major challenge for the real world. Clustering is one of the most important branches of data mining for data analysis and its purpose is to divide the data into meaningful subsets called clusters. Hierarchical clustering is one of the unsupervised learning algorithms for grouping data points with similar properties, so that its concept lies in the construction and analysis of dendrograms. Over the decades, many algorithms have been developed for clustering with different approaches. In this paper, an efficient ensemble hierarchical clustering algorithm based on MapReduce-based clusters clustering technique and an innovative similarity criterion is introduced. The main idea of ensemble clustering is to combine the results of different single clustering methods. Ensemble techniques usually produce better results than single methods due to multiple learning. Accordingly, it can be expected that the aggregation of hierarchical clustering methods will lead to higher quality in clustering. In addition, MapReduce is a model for implementing big data applications, where we use this model to implement hierarchical clustering methods. Meanwhile, the similarity between the samples is calculated through an innovative similarity criterion. The proposed approach is presented in three steps. In the first step, the data are clustered by several single hierarchical clustering methods. Then in the second step, hyper-clusters are generated by applying the clusters clustering technique. Finally, the final clusters are generated in the third step. This is done by allocating samples to hyper-clusters. Accordingly, the final clusters are formed in the third step. The simulation is performed on multiple real-world datasets and the results show better performance of the proposed approach compared to algorithms such as CHC and RCESCC.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号