首页> 美国卫生研究院文献>Springer Open Choice >Coverage theories for metagenomic DNA sequencing based on a generalization of Stevens’ theorem
【2h】

Coverage theories for metagenomic DNA sequencing based on a generalization of Stevens’ theorem

机译:基于史蒂文斯定理的推广的宏基因组DNA测序的覆盖理论

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Metagenomic project design has relied variously upon speculation, semi-empirical and ad hoc heuristic models, and elementary extensions of single-sample Lander–Waterman expectation theory, all of which are demonstrably inadequate. Here, we propose an approach based upon a generalization of Stevens’ Theorem for randomly covering a domain. We extend this result to account for the presence of multiple species, from which are derived useful probabilities for fully recovering a particular target microbe of interest and for average contig length. These show improved specificities compared to older measures and recommend deeper data generation than the levels chosen by some early studies, supporting the view that poor assemblies were due at least somewhat to insufficient data. We assess predictions empirically by generating roughly 4.5 Gb of sequence from a twelve member bacterial community, comparing coverage for two particular members, Selenomonas artemidis and Enterococcus faecium, which are the least (3 %) and most (12 %) abundant species, respectively. Agreement is reasonable, with differences likely attributable to coverage biases. We show that, in some cases, bias is simple in the sense that a small reduction in read length to simulate less efficient covering brings data and theory into essentially complete accord. Finally, we describe two applications of the theory. One plots coverage probability over the relevant parameter space, constructing essentially a “metagenomic design map” to enable straightforward analysis and design of future projects. The other gives an overview of the data requirements for various types of sequencing milestones, including a desired number of contact reads and contig length, for detection of a rare viral species.
机译:元基因组项目设计在多种程度上依赖于推测,半经验和临时启发式模型以及单一样本Lander-Waterman期望理论的基本扩展,所有这些显然都不够充分。在这里,我们提出一种基于史蒂文斯定理的概括的方法,用于随机覆盖一个域。我们将此结果扩展为考虑到多个物种的存在,从中可以得出完全回收特定目标微生物和平均重叠群长度的有用概率。与较早的措施相比,这些方法显示出更高的特异性,并建议比某些早期研究选择的水平更深的数据生成,支持以下观点:组装不良的原因至少是由于数据不足。我们通过从一个12个成员的细菌群落中产生大约4.5 Gb的序列,并比较两个特定成员的覆盖范围(分别为最小(3%)和最多(12%)的物种,即Arteledis artemidis和屎肠球菌)的覆盖率,凭经验评估预测。协议是合理的,差异可能归因于覆盖偏见。我们表明,在某些情况下,从某种意义上说,偏差是简单的,即读取长度的少量减少(模拟效率较低的覆盖范围)会使数据和理论基本上完全一致。最后,我们描述了该理论的两种应用。一个人绘制了在相关参数空间上的覆盖概率,从本质上构建了一个“元数据设计图”,以实现对未来项目的直接分析和设计。另一个概述了各种类型的测序里程碑对数据的要求,包括所需的接触读数和重叠群长度,用于检测稀有病毒物种。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号