...
首页> 外文期刊>Journal of Computer Science & Technology >Approaches for Scaling DBSCAN Algorithm to Large Spatial Databases
【24h】

Approaches for Scaling DBSCAN Algorithm to Large Spatial Databases

机译:将DBSCAN算法扩展到大型空间数据库的方法

获取原文
获取原文并翻译 | 示例
           

摘要

The huge amoullt of information stored in databases owned by cor- porations (e.g., retail, financial, telecom) has spurred a tremendous interest in the area of knowledge discovery and data mining. Clustering, in data mining, is a useful technique for discovering interesting data distributions and patterns in the underlying data, and has many application fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Although researchers have been working on clustering algorithms for decades, and a lot of algorithms for clustering have been developed, there is still no efficient algorithm for clustering very large databases and high dimensional data. As an outstanding representative of clustering algorithms, DBSCAN algorithm shows good performance in spatial data clustering. However, for large spatial databases, DBSCAN requires large volume of memory support and could incur substantial I/O costs because it operates directly on the entire database. In this paper) several approaches are proposed to scale DBSCAN algorithm to large spatial databases. To begin with, a fast DBSCAN algorithm is developed, which considerably speeds up the original DBSCAN algorithm. Then a sampling based DBSCAN algorithm, a partitioning-based DBSCAN algorithm, and a parallel DBSCAN algorithm are introduced consecutively. Following that, based on the above-proposed algorithms, a synthetic algorithm is also given. Finally some experimental results are given to demonstrate the effectiveness and efficiency of these algorithms.
机译:公司(例如,零售,金融,电信)拥有的数据库中存储的大量信息激起了人们对知识发现和数据挖掘领域的巨大兴趣。在数据挖掘中,聚类是用于发现基础数据中有趣的数据分布和模式的有用技术,并且具有许多应用程序领域,例如统计数据分析,模式识别,图像处理和其他业务应用程序。尽管研究人员一直在研究聚类算法数十年,并且已经开发了许多用于聚类的算法,但是仍然没有有效的算法来聚类大型数据库和高维数据。作为聚类算法的杰出代表,DBSCAN算法在空间数据聚类中表现出良好的性能。但是,对于大型空间数据库,DBSCAN需要大量的内存支持,并且可能直接在整个数据库上运行,因此可能会产生大量的I / O成本。本文中)提出了几种将DBSCAN算法扩展到大型空间数据库的方法。首先,开发了一种快速的DBSCAN算法,该算法大大加快了原始DBSCAN算法的速度。然后依次介绍了基于采样的DBSCAN算法,基于分区的DBSCAN算法和并行DBSCAN算法。然后,基于上述算法,给出了一种综合算法。最后给出了一些实验结果,以证明这些算法的有效性和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号