摘要: 针对 “基于密度的带有噪声的空间聚类” (DBSCAN)算法存在的不足 ,提出 “分而治之” 和高效的并行方法对DBSCAN 算法进行改进.通过对数据进行划分,利用 “分而治之” 思想减少全局变量 Ep s值的影响;利用并行处理方法和降维技术提高聚类效率 ,降低 DBSCAN 算法对内存的较高要求;采用增量式处理方式解决数据对象的增加和删除对聚类的影响.结果表明:新方法有效地解决了DBSCAN 算法存在的问题 ,其聚类效率和聚类效果明显优于传统 DBSCAN 聚类算法Abstract : An improved density based spatial clustering of applications with noise (DBSCAN) algorit hm , which can considerably improve cluster quality , is proposed. The algorithm is based on two ideas : dividing and ruling , and ; high performance parallel methods. The idea of dividing and ruling was used to reduce the effect of the global variable Eps by data partition. Parallel processing methods and the technique of reducing dimensionality were used to improve the efficiency of clustering and to reduce the large memory space requirements of the DBSCAN algorithm. Finally , an incremental processing method was applied to determine t he influence on clustering of inserting or deleting data objects. The results show that an implementation of the new met hod solves existing problems treated by the DBSCAN algorithm : Both the efficiencyand the cluster quality are better than for the original DBSCAN algorithm.
展开▼
机译:摘要: 针对 “基于密度的带有噪声的空间聚类” (DBSCAN)算法存在的不足 ,提出 “分而治之” 和高效的并行方法对DBSCAN 算法进行改进.通过对数据进行划分,利用 “分而治之” 思想减少全局变量 Ep s值的影响;利用并行处理方法和降维技术提高聚类效率 ,降低 DBSCAN 算法对内存的较高要求;采用增量式处理方式解决数据对象的增加和删除对聚类的影响.结果表明:新方法有效地解决了DBSCAN 算法存在的问题 ,其聚类效率和聚类效果明显优于传统 DBSCAN 聚类算法Abstract : An improved density based spatial clustering of applications with noise (DBSCAN) algorit hm , which can considerably improve cluster quality , is proposed. The algorithm is based on two ideas : dividing and ruling , and ; high performance parallel methods. The idea of dividing and ruling was used to reduce the effect of the global variable Eps by data partition. Parallel processing methods and the technique of reducing dimensionality were used to improve the efficiency of clustering and to reduce the large memory space requirements of the DBSCAN algorithm. Finally , an incremental processing method was applied to determine t he influence on clustering of inserting or deleting data objects. The results show that an implementation of the new met hod solves existing problems treated by the DBSCAN algorithm : Both the efficiencyand the cluster quality are better than for the original DBSCAN algorithm.
展开▼