首页> 外文会议>International Conference on Machine Learning, Big Data and Business Intelligence >A Parallel Adaptive DBSCAN Algorithm Based on k-Dimensional Tree Partition
【24h】

A Parallel Adaptive DBSCAN Algorithm Based on k-Dimensional Tree Partition

机译:基于K维树分区的并联自适应DBSCAN算法

获取原文

摘要

The existing parallel DBSCAN (density based spatial clustering of applications with noise) algorithm needs to determine the parameter settings manually, and the datasets will be repeatedly accessed in the process of data partitioning and data merging, which reduces the efficiency of the algorithm excuting. Therefore, this paper proposes a parallel adaptive DBSCAN algorithm based on k-dimensional tree partition. It divides the dataset into several balanced data partitions by using k-dimensional tree, and carries out parallel computing in spark distributed computing framework, thus increasing the concurrent processing ability of the algorithm program and improving the I/O access speed. In addition, the improved adaptive DBSCAN parameter method is applied to each data partition for clustering analysis to obtain local clusters, which solves the random problem of manual setting parameters in the clustering process, and ensures the data quality of clustering mining. At the same time of creating local clusters, this algorithm also puts the mapping relationship between data points and adjacent points into the HashMap data structure of the master node, and uses it to merge local clusters into whole clusters, which can reduce the time cost of data merging. The experimental results show that the proposed algorithm can save about 18% running time compared with RDD-DBSCAN algorithm without reducing the clustering quality. With the increase of the number of cluster nodes, the running efficiency of the algorithm can be further improved, so it is suitable for processing massive data clustering analysis.
机译:现有的并行DBSCAN(具有噪声的应用程序的密度基于空间聚类)算法需要手动确定参数设置,并且在数据分区和数据合并过程中将重复访问数据集,这降低了算法突出的效率。因此,本文提出了一种基于K维树分区的并联自适应DBSCAN算法。它通过使用K维树将数据集分成多个平衡数据分区,并在火花分布式计算框架中执行并行计算,从而提高算法程序的并发处理能力并提高I / O接入速度。此外,改进的自适应DBSCAN参数方法应用于用于聚类分析的每个数据分区以获取本地群集,该群集解决了群集过程中手动设置参数的随机问题,并确保了聚类挖掘的数据质量。在创建本地集群的同时,该算法还将数据点与相邻点之间的映射关系放入主节点的HashMap数据结构中,并使用它将本地集群合并到整个群集中,这可以降低时间成本数据合并。实验结果表明,与RDD-DBSCAN算法相比,所提出的算法可以节省约18%的运行时间,而不会降低聚类质量。随着集群节点数量的增加,可以进一步提高算法的运行效率,因此适用于处理大量数据聚类分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号