...
首页> 外文期刊>Systematic Biology >Resolving Ambiguity of Species Limits and Concatenation in Multilocus Sequence Data forthe Construction of Phylogenetic Supermatrices
【24h】

Resolving Ambiguity of Species Limits and Concatenation in Multilocus Sequence Data forthe Construction of Phylogenetic Supermatrices

机译:解决多基因座序列数据中物种限制和连接的歧义,以构建系统进化超矩阵

获取原文
获取原文并翻译 | 示例
           

摘要

Public DNA databases are becoming too large and too complex for manual methods to generate phylogenetic supermatrices from multiple gene sequences. Delineating the terminals based on taxonomic labels is no longer practical because species identifications are frequently incomplete and gene trees are incongruent with Linnaean binomials, which results in uncertainty about how to combine species units among unlinked loci. We developed a procedure that minimizes the problem of forming multilocus species units in a large phylogenetic data set using algorithms from graph theory. An initial step established sequence clusters for each locus that broadly correspond to the species level. These clusters frequently include sequences labeled with various binomials and specimen identifiers that create multiple alternatives for concatenation. To choose among these possibilities, we minimize taxonomic conflict among the species units globally in the data set using a multipartite heuristic algorithm. The procedure was applied to all available GenBank data for Coleoptera (beetles) including > 10 500 taxon labels and > 23 500 sequences of 4 loci, which were grouped into 11 241 clusters or divergent singletons by the BlastClust software. Within each cluster, unidentified sequences could be assigned to a species name through the association with fully identified sequences, resulting in 510 new identifications (13.9% of total unidentified sequences) of which nearly half were "trans-locus" identifications by clusteringof sequences at a secondary locus. The limits of DNA-based clusters were inconsistent with the Linnaean binomials for 1518 clusters (13.5%) that contained more than one binomial or split a single binomial among multiple clusters. By applying a scoring scheme for full and partial name matches in pairs of clusters, a maximum weight set of 7366 global species units was produced. Varying the match weights for partial matches had little effect on the number of units, although if partial matches were disallowed, the number increased greatly. Trees from the resulting supermatrices generally produced tree topologies in good agreement with the higher taxonomy of Coleoptera, with fewer terminals compared with trees generated according to standard filtering of sequences using species labels. The study illustrates a strategy for assembling the tree-of-life from an ever more complex primary database.
机译:公共DNA数据库变得太大和太复杂,以至于无法通过手动方法从多个基因序列中生成系统发育超矩阵。基于分类标签来描述终端不再可行,因为物种识别常常不完整,并且基因树与Linnaean二项式不符,这导致如何在未链接的基因座中组合物种单位的不确定性。我们使用图论算法开发了一种程序,该程序可最大程度地减少在大型系统发育数据集中形成多基因座物种单位的问题。第一步是为每个基因座建立广泛对应于物种水平的序列簇。这些簇通常包括标有各种二项式和标本标识符的序列,这些序列为串联创建了多种选择。要在这些可能性中进行选择,我们使用多部分启发式算法将数据集中全局物种单元之间的分类冲突最小化。该程序已应用于鞘翅目(甲虫)的所有可用GenBank数据,包括> 10 500个分类单元标签和> 23 500个4个基因座序列,通过BlastClust软件将它们分组为11 241个簇或发散单例。在每个簇中,可以通过与完全识别的序列相关联将未识别的序列分配给物种名称,从而产生510个新的鉴定(占总未鉴定序列的13.9%),其中近一半是通过将序列聚类到一个“跨位点”进行鉴定。次要位置。基于DNA的聚类的限制与1518个聚类的Linnaean二项式不一致(13.5%),后者包含多个二项式或在多个聚类中拆分单个二项式。通过对成对的全名和部分名匹配应用计分方案,产生了7366个全球物种单位的最大权重集。更改部分比赛的比赛权重对单位数量的影响很小,尽管如果不允许部分比赛,则数量会大大增加。从产生的超级矩阵中获得的树通常产生的树形拓扑与鞘翅目的更高分类法非常吻合,与根据使用物种标签的标准序列过滤生成的树相比,树的终端更少。这项研究说明了一种从越来越复杂的主数据库中组装生命树的策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号