首页> 外文期刊>International Journal of Applied Engineering Research >Dice Similarity Based Ensemble Clustering for Sparsely Distributed High Dimensional Data
【24h】

Dice Similarity Based Ensemble Clustering for Sparsely Distributed High Dimensional Data

机译:基于稀疏分布的高维数据的骰子相似度集群

获取原文
获取原文并翻译 | 示例
           

摘要

Data mining is the process of extracting the valuable patterns from large volume of data. Clustering in data mining is the process of dividing the data points depending on their similarity level. Clustering techniques for managing the high dimensional data is more complicated because of intrinsic sparsity nature of high dimensional data. However, the clustering accuracy and similarity measurement time was not improved using existing clustering techniques such as fuzzy c-means and spectral clustering. In order to overcome these limitations, Dice Similarity Threshold based Ensemble Clustering (DST-EC) Technique is introduced. DST-EC technique clusters the sparsely distributed high dimensional data points based on the similarity value. Initially in DST-EC technique, Dice Similarity Coefficient Measurement Algorithm is introduced to measure the similarity between two high dimensional data points with minimum similarity measurement time consumption and higher true positive rate. After finding the similarity, the different similarity threshold range is set for clustering the data points. Finally based on the similarity threshold value, Similarity Threshold Ensemble Clustering Algorithm clusters the similar data points to form number of clusters with higher clustering accuracy. The performance of DST-EC technique is measured in terms of true positive rate, similarity measurement time and clustering accuracy with El Nino weather data sets from UCI Machine Learning Repository. The experimental result explains that the DST-EC technique improves the clustering accuracy by 15% and reduces the similarity measurement time by 21% when compared to state-of-the-art-works.
机译:数据挖掘是从大量数据中提取有价值的模式的过程。数据挖掘中的聚类是根据其相似度划分数据点的过程。由于高维数据的内在稀疏性,因此用于管理高维数据的聚类技术更加复杂。但是,使用现有的聚类技术(例如模糊C均值和光谱聚类),不会改善聚类精度和相似度测量时间。为了克服这些限制,引入了基于骰子相似性阈值的集群(DST-EC)技术。 DST-EC技术基于相似性值群稀疏分布的高维数据点。最初在DST-EC技术中,引入了骰子相似度系数测量算法以测量具有最小相似性测量时间消耗和更高的真实阳性率之间的两个高维数据点之间的相似性。在找到相似性之后,将不同的相似性阈值范围设置用于群集数据点。最后基于相似性阈值,相似度阈值集群聚类算法群集类似的数据点以形成具有更高聚类精度的簇数。 DST-EC技术的性能是以来自UCI机器学习存储库的EL NINO天气数据集的真正阳性率,相似度测量时间和聚类精度来衡量。实验结果说明,与最先进的工作相比,DST-EC技术将聚类精度提高了15%,并将相似度测量时间减少了21%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号