首页> 美国卫生研究院文献>Scientific Reports >Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer
【2h】

Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer

机译:使用无比对方法的病毒系统生物学:确定k-mer最佳长度的三步法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral “tree of life”. However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conserved proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. The resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.
机译:快速,经济的基因组测序的发展为病毒的分类提供了新的思路。截至2016年10月,美国国家生物技术信息中心(NCBI)数据库包含了超过200万个病毒基因组序列,以及约4000个病毒基因组序列的参考集,涵盖了广泛的已知病毒家族。全基因组序列可用于改善病毒分类并提供对病毒“生命之树”的洞察力。但是,由于在各种病毒之间缺乏进化保守性,因此使用基于保守蛋白的传统系统发育方法构建病毒生命树是不可行的。在这项研究中,我们使用了一种无比对方法,该方法使用k-mers作为基因组特征对RefSeq中可用的完整病毒基因组进行大规模比较。为了确定最佳特征长度k(构建有意义的树状图的必不可少的步骤),我们设计了一种综合策略,该策略结合了三种方法:(1)累积相对熵,(2)基因组中共同特征的平均数量和(3) )香农多样性指数。该策略用于确定RefSeq中所有3,905个完整病毒基因组的k。生成的树状图显示出与ICTV的病毒分类法和巴尔的摩病毒分类的一致性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号