...
首页> 外文期刊>Genome research >Accurate typing of short tandem repeats from genome-wide sequencing data and its applications
【24h】

Accurate typing of short tandem repeats from genome-wide sequencing data and its applications

机译:从全基因组测序数据准确键入短串联重复序列及其应用

获取原文
获取原文并翻译 | 示例
           

摘要

Short tandem repeats (STRs) are implicated in dozens of human genetic diseases and contribute significantly to genome variation and instability. Yet profiling STRs from short-read sequencing data is challenging because of their high sequencing error rates. Here, we developed STR-FM, short tandem repeat profiling using flank-based mapping, a computational pipeline that can detect the full spectrum of STR alleles from short-read data, can adapt to emerging read-mapping algorithms, and can be applied to heterogeneous genetic samples (e.g., tumors, viruses, and genomes of organelles). We used STR-FM to study STR error rates and patterns in publicly available human and in-house generated ultradeep plasmid sequencing data sets. We discovered that STRs sequenced with a PCR-free protocol have up to ninefold fewer errors than those sequenced with a PCR-containing protocol. We constructed an error correction model for genotyping STRs that can distinguish heterozygous alleles containing STRs with consecutive repeat numbers. Applying our model and pipeline to Illumina sequencing data with 100-bp reads, we could confidently genotype several disease-related long trinucleotide STRs. Utilizing this pipeline, for the first time we determined the genome-wide STR germline mutation rate from a deeply sequenced human pedigree. Additionally, we built a tool that recommends minimal sequencing depth for accurate STR genotyping, depending on repeat length and sequencing read length. The required read depth increases with STR length and is lower for a PCR-free protocol. This suite of tools addresses the pressing challenges surrounding STR genotyping, and thus is of wide interest to researchers investigating disease-related STRs and STR evolution.
机译:短串联重复序列(STRs)与数十种人类遗传疾病有关,并且对基因组变异和不稳定性起重要作用。然而,由于短测序数据的高测序错误率,从STRs进行概要分析具有挑战性。在这里,我们开发了STR-FM,使用基于侧面的映射进行的短串联重复序列分析,该计算管道可以从短读数据中检测STR等位基因的全部光谱,可以适应新兴的读图算法,并且可以应用于异质基因样本(例如肿瘤,病毒和细胞器基因组)。我们使用STR-FM在可公开获得的人类和内部生成的超深质粒测序数据集中研究STR错误率和模式。我们发现,使用不含PCR的协议测序的STR的错误比使用含PCR的协议测序的STR的错误少多达九倍。我们为基因分型的STR构建了一个错误校正模型,该模型可以区分包含具有重复序列的STR的杂合等位基因。将我们的模型和管线应用于具有100 bp读数的Illumina测序数据,我们可以自信地对几种与疾病相关的长三核苷酸STR进行基因分型。利用这一流水线,我们首次从深度测序的人谱系中确定了全基因组的STR种系突变率。此外,我们构建了一个工具,根据重复长度和测序读段长度,该工具建议最小的测序深度以进行准确的STR基因分型。所需的读取深度随STR长度的增加而增加,而对于无PCR的协议则较低。这套工具解决了围绕STR基因分型的紧迫挑战,因此对于研究与疾病相关的STR和STR进化的研究人员具有广泛的兴趣。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号