...
首页> 外文期刊>BMC Bioinformatics >CLAG: an unsupervised non hierarchical clustering algorithm handling biological data
【24h】

CLAG: an unsupervised non hierarchical clustering algorithm handling biological data

机译:CLAG:处理生物数据的无监督非分层聚类算法

获取原文
           

摘要

Background Searching for similarities in a set of biological data is intrinsically difficult due to possible data points that should not be clustered, or that should group within several clusters. Under these hypotheses, hierarchical agglomerative clustering is not appropriate. Moreover, if the dataset is not known enough, like often is the case, supervised classification is not appropriate either. Results CLAG (for CLusters AGgregation) is an unsupervised non hierarchical clustering algorithm designed to cluster a large variety of biological data and to provide a clustered matrix and numerical values indicating cluster strength. CLAG clusterizes correlation matrices for residues in protein families, gene-expression and miRNA data related to various cancer types, sets of species described by multidimensional vectors of characters, binary matrices. It does not ask to all data points to cluster and it converges yielding the same result at each run. Its simplicity and speed allows it to run on reasonably large datasets. Conclusions CLAG can be used to investigate the cluster structure present in biological datasets and to identify its underlying graph. It showed to be more informative and accurate than several known clustering methods, as hierarchical agglomerative clustering, k-means, fuzzy c-means, model-based clustering, affinity propagation clustering, and not to suffer of the convergence problem proper to this latter.
机译:背景技术由于不应该将可能的数据点聚类或者应该在几个聚类内分组,因此在一组生物数据中搜索相似性本质上是困难的。在这些假设下,分层聚集聚类是不合适的。此外,如果对数据集的了解不够(通常是这样),则监督分类也不适合。结果CLAG(用于CLusters AGgregation)是一种无监督的非分层聚类算法,旨在聚类大量生物数据并提供聚类矩阵和表示聚类强度的数值。 CLAG对蛋白质家族中的残基,与各种癌症类型有关的基因表达和miRNA数据,由字符的多维向量描述的物种集,二元矩阵等相关矩阵进行聚类。它不会要求所有数据点都聚类,并且会在每次运行时收敛以产生相同的结果。它的简单性和速度使其可以在相当大的数据集上运行。结论CLAG可用于研究生物数据集中存在的簇结构并确定其基础图。与分层聚类聚类,k均值,模糊c均值,基于模型的聚类,亲和度传播聚类相比,它比几种已知的聚类方法更具信息性和准确性,并且没有遇到适合后者的聚类问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号