...
首页> 外文期刊>PLoS Computational Biology >One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies
【24h】

One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies

机译:一种尺寸并不适合所有人-RefEditor:建立个性化的二倍体参考基因组,以改善下一代测序研究中的读图和基因型调用

获取原文
           

摘要

With rapid decline of the sequencing cost, researchers today rush to embrace whole genome sequencing (WGS), or whole exome sequencing (WES) approach as the next powerful tool for relating genetic variants to human diseases and phenotypes. A fundamental step in analyzing WGS and WES data is mapping short sequencing reads back to the reference genome. This is an important issue because incorrectly mapped reads affect the downstream variant discovery, genotype calling and association analysis. Although many read mapping algorithms have been developed, the majority of them uses the universal reference genome and do not take sequence variants into consideration. Given that genetic variants are ubiquitous, it is highly desirable if they can be factored into the read mapping procedure. In this work, we developed a novel strategy that utilizes genotypes obtained a priori to customize the universal haploid reference genome into a personalized diploid reference genome. The new strategy is implemented in a program named RefEditor. When applying RefEditor to real data, we achieved encouraging improvements in read mapping, variant discovery and genotype calling. Compared to standard approaches, RefEditor can significantly increase genotype calling consistency (from 43% to 61% at 4X coverage; from 82% to 92% at 20X coverage) and reduce Mendelian inconsistency across various sequencing depths. Because many WGS and WES studies are conducted on cohorts that have been genotyped using array-based genotyping platforms previously or concurrently, we believe the proposed strategy will be of high value in practice, which can also be applied to the scenario where multiple NGS experiments are conducted on the same cohort. The RefEditor sources are available at https://github.com/superyuan/refeditor.
机译:随着测序成本的快速下降,当今的研究人员急于采用全基因组测序(WGS)或全外显子组测序(WES)方法作为将遗传变异与人类疾病和表型相关联的下一个强大工具。分析WGS和WES数据的一个基本步骤是将短测序读数映射回参考基因组。这是一个重要的问题,因为错误映射的读取会影响下游变体发现,基因型调用和关联分析。尽管已经开发了许多读图算法,但其中大多数使用通用参考基因​​组,并且未考虑序列变体。鉴于遗传变体无处不在,如果能够将它们纳入读图过程中,则是非常理想的。在这项工作中,我们开发了一种新颖的策略,利用获得的先验基因型将通用单倍体参考基因组定制为个性化的二倍体参考基因组。新策略在名为RefEditor的程序中实现。将RefEditor应用于实际数据时,我们在读取映射,变体发现和基因型调用方面取得了令人鼓舞的改进。与标准方法相比,RefEditor可以显着提高基因型调用的一致性(在4倍覆盖率下从43%增至61%;在20倍覆盖率下从82%增至92%),并减少各种测序深度下的孟德尔不一致。由于许多WGS和WES研究都是针对先前或同时使用基于阵列的基因分型平台进行基因分型的同类人群进行的,因此我们认为所提出的策略在实践中将具有很高的价值,也可以应用于进行多个NGS实验的情况在同一队列中进行。 RefEditor的资源可从https://github.com/superyuan/refeditor获得。

著录项

相似文献

  • 外文文献
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号