首页> 美国卫生研究院文献>Philosophical Transactions of the Royal Society B: Biological Sciences >The genetic code can cause systematic bias in simple phylogenetic models
【2h】

The genetic code can cause systematic bias in simple phylogenetic models

机译:遗传密码可以在简单的系统发育模型中引起系统偏见

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Phylogenetic analysis depends on inferential methodology estimating accurately the degree of divergence between sequences. Inaccurate estimates can lead to misleading evolutionary inferences, including incorrect tree topology estimates and poor dating of historical species divergence. Protein coding sequences are ubiquitous in phylogenetic inference, but many of the standard methods commonly used to describe their evolution do not explicitly account for the dependencies between sites in a codon induced by the genetic code. This study evaluates the performance of several standard methods on datasets simulated under a simple substitution model, describing codon evolution under a range of different types of selective pressures. This approach also offers insights into the relative performance of different phylogenetic methods when there are dependencies acting between the sites in the data. Methods based on statistical models performed well when there was no or limited purifying selection in the simulated sequences (low degree of dependency between sites in a codon), although more biologically realistic models tended to outperform simpler models. Phylogenetic methods exhibited greater variability in performance for sequences simulated under strong purifying selection (high degree of the dependencies between sites in a codon). Simple models substantially underestimate the degree of divergence between sequences, and underestimation was more pronounced on the internal branches of the tree. This underestimation resulted in some statistical methods performing poorly and exhibiting evidence for systematic bias in tree inference. Amino acid-based and nucleotide models that contained generic descriptions of spatial and temporal heterogeneity, such as mixture and temporal hidden Markov models, coped notably better, producing more accurate estimates of evolutionary divergence and the tree topology.
机译:系统发育分析依赖于推论方法,该推论方法可准确估计序列之间的差异程度。不正确的估计会导致误导性的进化论推论,包括不正确的树形拓扑估计以及对历史物种差异的错误定年。蛋白质编码序列在系统发育推断中无处不在,但是通常用于描述其进化的许多标准方法并未明确说明遗传密码子所诱导的密码子中位点之间的依赖性。这项研究评估了在简单替换模型下模拟的数据集上几种标准方法的性能,描述了在一系列不同类型的选择压力下密码子的演化。当数据中的位点之间存在依赖性时,这种方法还可以洞察不同系统发育方法的相对性能。当在模拟序列中没有或仅有有限的纯化选择(密码子位点之间的依赖性低)时,基于统计模型的方法表现良好,尽管生物学上更现实的模型往往优于简单的模型。系统发育方法对在强纯化选择下模拟的序列表现出较大的性能差异(密码子位点之间的高度依赖性)。简单的模型大大低估了序列之间的差异程度,而低估在树的内部分支上更为明显。这种低估导致一些统计方法表现不佳,并显示出树木推断系统偏见的证据。包含空间和时间异质性的一般描述的基于氨基酸的核苷酸模型,例如混合和时间隐式马尔可夫模型,可以更好地应对,从而产生更准确的进化差异和树形拓扑估计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号