On the maximal cliques in c-max-tolerance graphs and their application in clustering molecular sequences

机译：关于c-max-tolerance图中的最大集团及其在分子序列聚类中的应用

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Given a set S of n locally aligned sequences, it is a needed prerequisite to partition it into groups of very similar sequences to facilitate subsequent computations, such as the generation of a phylogenetic tree. This article introduces a new method of clustering which partitions S into subsets such that the overlap of each pair of sequences within a subset is at least a given percentage c of the lengths of the two sequences. We show that this problem can be reduced to finding all maximal cliques in a special kind of max-tolerance graph which we call a c-max-tolerance graph. Previously we have shown that finding all maximal cliques in general max-tolerance graphs can be done efficiently in O(n³+ out). Here, using a new kind of sweep-line algorithm, we show that the restriction to c-max-tolerance graphs yields a better runtime of O(n²log n + out). Furthermore, we present another algorithm which is much easier to implement, and though theoretically slower than the first one, is still running in polynomial time. We then experimentally analyze the number and structure of all maximal cliques in a c-max-tolerance graph, depending on the chosen c-value. We apply our simple algorithm to artificial and biological data and we show that this implementation is much faster than the well-known application Cliquer. By introducing a new heuristic that uses the set of all maximal cliques to partition S, we finally show that the computed partition gives a reasonable clustering for biological data sets.

机译：给定n个本地对齐序列的集合S，将其划分为非常相似的序列组以促进后续计算（例如系统发育树的生成）是必要的前提条件。本文介绍了一种新的聚类方法，该方法将S划分为子集，以使子集内每对序列的重叠至少是两个序列长度的给定百分比c。我们表明，可以将这个问题简化为在一种特殊的最大公差图中找到所有最大派系，我们称之为c-max-tolerance图。之前我们已经证明，在O（n ^{3 + out）中可以有效地找到一般最大公差图中的所有最大派系。在这里，使用一种新型的扫描线算法，我们证明了对c-max-tolerance图的限制产生了更好的O（n ^{2 log n + out）运行时间。此外，我们提出了另一种算法，该算法易于实现，尽管理论上比第一种算法慢，但仍在多项式时间内运行。然后，我们根据所选的c值，对c-max-tolerance图中的所有最大集团的数量和结构进行实验分析。我们将简单的算法应用于人工和生物数据，并且表明该实现比众所周知的应用程序Cliquer快得多。通过引入一种使用所有最大派系的集合进行分区的新启发式方法，我们最终表明，计算出的分区为生物数据集提供了合理的聚类。}}

著录项

期刊名称 Algorithms for Molecular Biology : AMB
作者
Katharina A Lehmann; Michael Kaufmann; Stephan Steigele; Kay Nieselt;
展开▼
作者单位

展开▼
年(卷),期 2006(1),-1
年度 2006
页码 9
总页数 17
原文格式 PDF
正文语种
中图分类应用微生物学;生化遗传学;生化药理学;
关键词

相似文献

外文文献
中文文献
专利

1. On the maximal cliques in c-max-tolerance graphs and their application in clustering molecular sequences [J] . Katharina A Lehmann, Michael Kaufmann, Stephan Steigele, Algorithms for Molecular Biology . 2006,第1期

机译：c-max-tolerance图中的最大集团及其在分子序列聚类中的应用
2. The Parallel Maximal Cliques Algorithm for Protein Sequence Clustering | Science Publications [J] . Khalid Jaber, NurAini A. Rashid, Rosni Abdullah American journal of applied sciences . 2009,第7期

机译：蛋白质序列聚类的并行最大派系算法科学出版物
3. Degree sequence conditions for maximally edge-connected graphs depending on the clique number [J] . Peter Dankelmann, Lutz Volkmann Discrete mathematics . 2000,第1a3期

机译：取决于边序数的最大边连接图的次数序列条件
4. Parallel Metagenomic Sequence Clustering Via Sketching and Maximal Quasi-clique Enumeration on Map-Reduce Clouds [C] . Yang Xiao, Zola Jaroslaw, Aluru Srinivas 2011 25th IEEE International Parallel Distributed Processing Symposium . 2011

机译：通过Map-Reduce云上的草图绘制和最大拟气候枚举进行并行元基因组序列聚类
5. On Maximal Green Sequences, Local-acyclicity, and Upper Cluster Algebras [D] . Mills, Matthew R. 2018

机译：关于最大绿色序列，局部非循环性和上聚类代数
6. Concise Single‐Step Synthesis of Sulfur‐Enriched Graphene: Immobilization of Molecular Clusters and Battery Applications [O] . Prof. Dr. Haruka Omachi, Tsukasa Inoue, Shuya Hatao, -1

机译：简捷的一步法合成富硫石墨烯：固定分子簇和电池应用
7. On the maximal cliques in -max-tolerance graphs and their application in clustering molecular sequences [O] . 2006

机译：-max-tolerance图中的最大集团及其在分子序列聚类中的应用
8. Listing All Maximal Cliques in Sparse Graphs in Near-optimal Time [R] . Eppstein, D., Loffler, M., Strash, D. 2011

机译：在近似最佳时间内列出稀疏图中的所有最大派系

On the maximal cliques in c-max-tolerance graphs and their application in clustering molecular sequences

摘要

著录项

相似文献

相关主题

期刊订阅