首页> 美国卫生研究院文献>other >A data-driven approach to estimating the number of clusters in hierarchical clustering

【2h】

A data-driven approach to estimating the number of clusters in hierarchical clustering

机译：一种数据驱动的方法来估计层次聚类中的聚类数量

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

DNA microarray and gene expression problems often require a researcher to perform clustering on their data in a bid to better understand its structure. In cases where the number of clusters is not known, one can resort to hierarchical clustering methods. However, there currently exist very few automated algorithms for determining the true number of clusters in the data. We propose two new methods (mode and maximum difference) for estimating the number of clusters in a hierarchical clustering framework to create a fully automated process with no human intervention. These methods are compared to the established elbow and gap statistic algorithms using simulated datasets and the Biobase Gene ExpressionSet. We also explore a data mixing procedure inspired by cross validation techniques. We find that the overall performance of the maximum difference method is comparable or greater to that of the gap statistic in multi-cluster scenarios, and achieves that performance at a fraction of the computational cost. This method also responds well to our mixing procedure, which opens the door to future research. We conclude that both the mode and maximum difference methods warrant further study related to their mixing and cross-validation potential. We particularly recommend the use of the maximum difference method in multi-cluster scenarios given its accuracy and execution times, and present it as an alternative to existing algorithms.

机译：DNA微阵列和基因表达问题经常需要研究人员对其数据进行聚类，以更好地了解其结构。在群集数量未知的情况下，可以采用分层群集方法。但是，目前很少有用于确定数据中簇的真实数量的自动算法。我们提出了两种新方法（模式差异和最大差异），用于估计分层聚类框架中的聚类数量，以创建无需人工干预的全自动流程。使用模拟数据集和Biobase Gene ExpressionSet，将这些方法与已建立的肘部和间隙统计算法进行比较。我们还探讨了受交叉验证技术启发的数据混合过程。我们发现，最大差异方法的整体性能与多集群方案中的差距统计量相当或更高，并且以少量的计算成本即可达到该性能。这种方法也很好地响应了我们的混合程序，这为将来的研究打开了大门。我们得出结论，模式和最大差异方法都需要对其混合和交叉验证潜力进行进一步研究。考虑到它的准确性和执行时间，我们特别建议在多群集方案中使用最大差异方法，并提出它作为现有算法的替代方法。

著录项

期刊名称 other
作者
Antoine E. Zambelli; Dylan M. Owen; Alok Sharma; Xin Zou;
展开▼
作者单位

展开▼
年(卷),期 -1(5),-1
年度 -1
页码 ISCB Comm J-2809
总页数 13
原文格式 PDF
正文语种
中图分类
关键词
Clustering Hierarchy Dendrogram Gene Expression Empirical;

机译：聚类;层次结构;树状图;基因表达;经验;

相似文献

外文文献
中文文献
专利

1. A novel approach to estimate emissions from large transportation networks: Hierarchical clustering-based link-driving-schedules for EPA-MOVES using dynamic time warping measures [J] . Aziz H. M. Abdul, Ukkusuri Satish V. International journal of sustainable transportation . 2018,第1a5期

机译：一种估算大型交通网络排放的新颖方法：采用动态时间规整措施的基于层次聚类的EPA-MOVES链接驱动时间表
2. Revealing Cluster Hierarchy in Gate-level ICs Using Block Diagrams and Cluster Estimates of Circuit Embeddings [J] . Cakir Burcin, Malik Sharad ACM Transactions on Design Automation of Electronic Systems . 2019,第5期

机译：使用框图和集群估算在门级IC中揭示群集层次结构电路嵌入式嵌入式
3. A Hierarchical Clustering Approach Based on Three-Dimensional Gray Relational Analysis for Clustering a Large Group of Decision Makers with Double Information [J] . Zhu Jianjun, Zhang Shitao, Chen Ye, Group decision and negotiation . 2016,第2期

机译：基于三维灰色关联分析的层次聚类方法对具有双重信息的大型决策者聚类
4. Kernel Hierarchical Agglomerative Clustering Comparison of Different Gap Statistics to Estimate the Number of Clusters [C] . Na Li, Nicolas Lefebvre, Regis Lengelle International Conference on Pattern Recognition Applications and Methods . 2014

机译：不同差距统计数据估算集群数量的内核分层凝聚聚类比较
5. A layered, hierarchical approach to services for a single system image on heterogeneous clusters. [D] . Collins, David E. 2001

机译：为异构集群上的单个系统映像提供服务的分层，分层方法。
6. How frequently do clusters occur in hierarchical clustering analysis? A graph theoretical approach to studying ties in proximity [O] . Wilmer Leal, Eugenio J. Llanos, Guillermo Restrepo, 2016

机译：聚类在层次聚类分析中出现的频率如何？研究邻近关系的图论方法
7. A Data-Driven Approach to Estimating the Number of Clusters in Hierarchical Clustering [O] . Zambelli, Antoine 2016

机译：一种估计数据集群数量的数据驱动方法分层聚类

A data-driven approach to estimating the number of clusters in hierarchical clustering

摘要

著录项

相似文献

相关主题

期刊订阅