首页> 外文期刊>Pattern recognition letters >GCHL: A grid-clustering algorithm for high-dimensional very large spatial data bases
【24h】

GCHL: A grid-clustering algorithm for high-dimensional very large spatial data bases

机译:GCHL:一种用于高维超大型空间数据库的网格聚类算法

获取原文
获取原文并翻译 | 示例
           

摘要

Spatial clustering, which groups similar spatial objects into classes, is an important component of spatial data mining [Han and Kamber, Data Mining: Concepts and Techniques, 2000]. Due to its immense applications in various areas, spatial clustering has been highly active topic in data mining researches, with fruitful, scalable clustering methods developed recently. These spatial clustering methods can be classified into four categories: partitioning method, hierarchical method, density-based method and grid-based method. Clustering large data sets of high dimensionality has always been a serious challenge for clustering algorithms. Many recently developed clustering algorithms have attempted to address either handling data with very large number of records or data sets with very high number of dimensions. This new clustering method GCHL (a Grid-Clustering algorithm for High-dimensional very Large spatial databases) combines a novel density-grid based clustering with axis-parallel partitioning strategy to identify areas of high density in the input data space. The algorithm work as well in the feature space of any data set. The method operates on a limited memory buffer and requires at most a single scan through the data. We demonstrate the high quality of the obtained clustering solutions, capability of discovering concave/deeper and convex/higher regions, their robustness to outlier and noise, and GCHL excellent scalability.
机译:空间聚类是将相似的空间对象分组为类,是空间数据挖掘的重要组成部分[Han和Kamber,数据挖掘:概念和技术,2000年]。由于其在各个领域的巨大应用,空间聚类一直是数据挖掘研究中非常活跃的话题,最近开发了卓有成效的可扩展聚类方法。这些空间聚类方法可以分为四类:划分方法,分层方法,基于密度的方法和基于网格的方法。聚类高维大数据集一直是聚类算法面临的严峻挑战。许多最近开发的聚类算法已经尝试解决处理具有大量记录的数据或具有大量维的数据集的问题。这种新的聚类方法GCHL(一种用于高维超大型空间数据库的网格聚类算法)将一种新颖的基于密度网格的聚类与轴平行分区策略相结合,以识别输入数据空间中的高密度区域。该算法在任何数据集的特征空间中也能正常工作。该方法在有限的内存缓冲区上运行,并且最多需要对数据进行一次扫描。我们证明了所获得的聚类解决方案的高质量,发现凹/深和凸/高区域的能力,其对异常值和噪声的鲁棒性以及GCHL出色的可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号