Dice Similarity Based Ensemble Clustering for Sparsely Distributed High Dimensional Data

R. Pushpalatha; K. Meenakshi Sundaram

首页> 外文期刊>International Journal of Applied Engineering Research >Dice Similarity Based Ensemble Clustering for Sparsely Distributed High Dimensional Data

【24h】

Dice Similarity Based Ensemble Clustering for Sparsely Distributed High Dimensional Data

机译：基于稀疏分布的高维数据的骰子相似度集群

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data mining is the process of extracting the valuable patterns from large volume of data. Clustering in data mining is the process of dividing the data points depending on their similarity level. Clustering techniques for managing the high dimensional data is more complicated because of intrinsic sparsity nature of high dimensional data. However, the clustering accuracy and similarity measurement time was not improved using existing clustering techniques such as fuzzy c-means and spectral clustering. In order to overcome these limitations, Dice Similarity Threshold based Ensemble Clustering (DST-EC) Technique is introduced. DST-EC technique clusters the sparsely distributed high dimensional data points based on the similarity value. Initially in DST-EC technique, Dice Similarity Coefficient Measurement Algorithm is introduced to measure the similarity between two high dimensional data points with minimum similarity measurement time consumption and higher true positive rate. After finding the similarity, the different similarity threshold range is set for clustering the data points. Finally based on the similarity threshold value, Similarity Threshold Ensemble Clustering Algorithm clusters the similar data points to form number of clusters with higher clustering accuracy. The performance of DST-EC technique is measured in terms of true positive rate, similarity measurement time and clustering accuracy with El Nino weather data sets from UCI Machine Learning Repository. The experimental result explains that the DST-EC technique improves the clustering accuracy by 15% and reduces the similarity measurement time by 21% when compared to state-of-the-art-works.

机译：数据挖掘是从大量数据中提取有价值的模式的过程。数据挖掘中的聚类是根据其相似度划分数据点的过程。由于高维数据的内在稀疏性，因此用于管理高维数据的聚类技术更加复杂。但是，使用现有的聚类技术（例如模糊C均值和光谱聚类），不会改善聚类精度和相似度测量时间。为了克服这些限制，引入了基于骰子相似性阈值的集群（DST-EC）技术。 DST-EC技术基于相似性值群稀疏分布的高维数据点。最初在DST-EC技术中，引入了骰子相似度系数测量算法以测量具有最小相似性测量时间消耗和更高的真实阳性率之间的两个高维数据点之间的相似性。在找到相似性之后，将不同的相似性阈值范围设置用于群集数据点。最后基于相似性阈值，相似度阈值集群聚类算法群集类似的数据点以形成具有更高聚类精度的簇数。 DST-EC技术的性能是以来自UCI机器学习存储库的EL NINO天气数据集的真正阳性率，相似度测量时间和聚类精度来衡量。实验结果说明，与最先进的工作相比，DST-EC技术将聚类精度提高了15％，并将相似度测量时间减少了21％。

著录项

来源
《International Journal of Applied Engineering Research》 |2017年第2期|共8页
作者
R. Pushpalatha; K. Meenakshi Sundaram;
展开▼
作者单位

Computer Science Erode Arts and Science College (Autonomous);

Department of Computer Science Erode Arts and Science College (Autonomous);

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类工程基础科学;
关键词
Data mining; Clustering; Similarity; Threshold range; Dice similarity coefficient measurement; Similarity threshold ensemble clustering;

机译：数据挖掘;聚类;相似性;阈值范围;骰子相似度系数测量;相似性阈值整体聚类;

相似文献

外文文献
中文文献
专利

1. Dice Similarity Based Ensemble Clustering for Sparsely Distributed High Dimensional Data [J] . R. Pushpalatha, K. Meenakshi Sundaram International Journal of Applied Engineering Research . 2017,第23aPta2期

机译：基于稀疏分布的高维数据的骰子相似度集群
2. ENSEMBLE-BASED TIME SERIES DATA CLUSTERING FOR HIGH DIMENSIONAL DATA [J] . Sampasetty Saravanan, Gulam Mohideen Kadhar Nawaz International Journal of Innovative Computing Information and Control . 2014,第4期

机译：基于封装的时间序列数据多维数据集
3. Anonymizing bag-valued sparse data by semantic similarity-based clustering [J] . Junqiang Liu, Ke Wang Knowledge and information systems . 2013,第2期

机译：通过基于语义相似性的聚类对袋值稀疏数据进行匿名处理
4. DiSCl: Distributed Intelligent Subspace Clustering, a density based clustering approach for very high dimensional distributed dataset [C] . International Conference on Networked Digital Technologies . 2009

机译：透析：分布式智能子空间聚类，基于密度基于高维分布式数据集的聚类方法
5. Relationship-based clustering and cluster ensembles for high-dimensional data mining. [D] . Strehl, Alexander. 2002

机译：用于高维数据挖掘的基于关系的聚类和聚类集成。
6. CASS: A distributed network clustering algorithm based on structure similarity for large-scale network [O] . Jungrim Kim, Mincheol Shin, Jeongwoo Kim, 2012

机译：CASS：一种基于结构相似性的大规模网络分布式网络聚类算法
7. Ensemble Clustering based on Heterogeneous Dimensionality Reduction Methods and Context-dependent Similarity Measures [O] . Augustine S. Nsang, Irene Diaz, Anca Ralescu 2014

机译：基于异构维数约简方法和依赖于上下文的相似度量的集成聚类

Dice Similarity Based Ensemble Clustering for Sparsely Distributed High Dimensional Data

摘要

著录项

相似文献

相关主题

期刊订阅