【24h】

Information Theoretic Approaches to Whole Genome Phylogenies

机译:全基因组系统发育的信息论方法

获取原文
获取原文并翻译 | 示例

摘要

We describe a novel method for efficient reconstruction of phylogenetic trees, based on sequences of whole genomes or proteomes. The core of our method is a new measure of pairwise distances between sequences, whose lengths may greatly vary. This measure is based on information theoretic tools (Kullback-Leibler relative entropy). We present an algorithm for efficiently computing these distances. The algorithm uses suffix arrays to compute the distance of two l long sequences in O(l) time. It is fast enough to enable the construction of the phylogenomic tree for hundreds of species, and the phylogenomic forest for almost two thousand viruses. An initial analysis of the results exhibits a remarkable agreement with "acceptable phylogenetic truth". To assess our approach, it was implemented together with a number of alternative approaches, including two that were previously published in the literature. Comparing their outcome to ours, using a "traditional" tree and a standard tree comparison method, our algorithm improved upon the "competition" by a substantial margin.
机译:我们描述了一种基于整个基因组或蛋白质组序列的有效重建系统树的新方法。我们方法的核心是对序列之间的成对距离进行新的测量,其长度可能会大大不同。此度量基于信息理论工具(Kullback-Leibler相对熵)。我们提出了一种有效计算这些距离的算法。该算法使用后缀数组来计算O(l)时间中两个l个长序列的距离。它的速度足够快,可以为数百种物种建立植物进化树,为几乎两千种病毒构建植物进化树。对结果的初步分析显示出与“可接受的系统发育事实”的显着一致性。为了评估我们的方法,它与许多替代方法一起实施,包括先前在文献中发表的两种方法。通过使用“传统”树和标准树比较方法,将他们的结果与我们的结果进行比较,我们的算法在“竞争”上有了很大的提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号