A Parallel Adaptive DBSCAN Algorithm Based on k-Dimensional Tree Partition

机译：基于K维树分区的并联自适应DBSCAN算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The existing parallel DBSCAN (density based spatial clustering of applications with noise) algorithm needs to determine the parameter settings manually, and the datasets will be repeatedly accessed in the process of data partitioning and data merging, which reduces the efficiency of the algorithm excuting. Therefore, this paper proposes a parallel adaptive DBSCAN algorithm based on k-dimensional tree partition. It divides the dataset into several balanced data partitions by using k-dimensional tree, and carries out parallel computing in spark distributed computing framework, thus increasing the concurrent processing ability of the algorithm program and improving the I/O access speed. In addition, the improved adaptive DBSCAN parameter method is applied to each data partition for clustering analysis to obtain local clusters, which solves the random problem of manual setting parameters in the clustering process, and ensures the data quality of clustering mining. At the same time of creating local clusters, this algorithm also puts the mapping relationship between data points and adjacent points into the HashMap data structure of the master node, and uses it to merge local clusters into whole clusters, which can reduce the time cost of data merging. The experimental results show that the proposed algorithm can save about 18% running time compared with RDD-DBSCAN algorithm without reducing the clustering quality. With the increase of the number of cluster nodes, the running efficiency of the algorithm can be further improved, so it is suitable for processing massive data clustering analysis.

机译：现有的并行DBSCAN（具有噪声的应用程序的密度基于空间聚类）算法需要手动确定参数设置，并且在数据分区和数据合并过程中将重复访问数据集，这降低了算法突出的效率。因此，本文提出了一种基于K维树分区的并联自适应DBSCAN算法。它通过使用K维树将数据集分成多个平衡数据分区，并在火花分布式计算框架中执行并行计算，从而提高算法程序的并发处理能力并提高I / O接入速度。此外，改进的自适应DBSCAN参数方法应用于用于聚类分析的每个数据分区以获取本地群集，该群集解决了群集过程中手动设置参数的随机问题，并确保了聚类挖掘的数据质量。在创建本地集群的同时，该算法还将数据点与相邻点之间的映射关系放入主节点的HashMap数据结构中，并使用它将本地集群合并到整个群集中，这可以降低时间成本数据合并。实验结果表明，与RDD-DBSCAN算法相比，所提出的算法可以节省约18％的运行时间，而不会降低聚类质量。随着集群节点数量的增加，可以进一步提高算法的运行效率，因此适用于处理大量数据聚类分析。

著录项

来源
《International Conference on Machine Learning, Big Data and Business Intelligence》|2020年|249-256|共8页
会议地点
作者
Xin Lu; Yu Wang; Jiao Yuan; Xun Wang; Kun Fu; Ke Yang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Machine learning algorithms; Merging; Clustering algorithms; Data structures; Partitioning algorithms; Sparks; Data mining;

机译：机器学习算法;合并;聚类算法;数据结构;分区算法;火花;数据挖掘;

相似文献

外文文献
中文文献
专利

1. Fuzzy Control Simultaneous Localization and Mapping Strategy Based on Iterative Closest Point and k-Dimensional Tree Algorithms [J] . Jih-Gau Juang, Jia-An Wang Sensors and materials . 2015,第8期

机译：基于迭代最近点和k维树算法的模糊控制同时定位与映射策略
2. Association rules mining in parallel conditional tree based on grid computing inspired partition algorithm [J] . Wang Chunzhi, Bian Wenshuo, Wang Ruoxi, International journal of web and grid services . 2020,第3期

机译：基于网格计算灵感分区算法的并行条件树挖掘
3. Multi-density DBSCAN Algorithm Based on Density Levels Partitioning [J] . Zhongyang Xiong, Ruotian Chen, Yufang Zhang, Journal of information and computational science . 2012,第10期

机译：基于密度等级划分的多密度DBSCAN算法
4. Parallel DBSCAN Algorithm Using a Data Partitioning Strategy with Spark Implementation [C] . Dianwei Han, Ankit Agrawal, Wei-keng Liao, IEEE International Conference on Big Data . 2018

机译：使用数据分区策略和Spark实现的并行DBSCAN算法
5. A near real-time, highly scalable, parallel and distributed adaptive object detection and re-training framework based on the AdaBoost algorithm [D] . Abualkibash, Munther 2015

机译：基于AdaBoost算法的近实时，高度可扩展，并行和分布式的自适应对象检测和再训练框架
6. Fast Parallel MR Image Reconstruction via B1-based Adaptive Restart Iterative Soft Thresholding Algorithms (BARISTA) [O] . Matthew J. Muckley, Douglas C. Noll, Jeffrey A. Fessler -1

机译：通过基于B1的自适应重启迭代软阈值算法（BARISTA）进行快速并行MR图像重建
7. Research on the Parallelization of the DBSCAN Clustering Algorithm for Spatial Data Mining Based on the Spark Platform [O] . Fang Huang, Qiang Zhu, Ji Zhou, 2017

机译：基于spark平台的DBsCaN空间数据挖掘聚类算法并行化研究
8. Parallel algorithms for the adaptive refinement and partitioning of unstructured meshes [R] . Jones, M. T., Plassmann, P. E. 1994

机译：用于非结构化网格的自适应细化和分区的并行算法

A Parallel Adaptive DBSCAN Algorithm Based on k-Dimensional Tree Partition

摘要

著录项

相似文献

相关主题

期刊订阅