【24h】

Probability Model for Boundaries of Short-Read Sequencing

机译:短读序列边界的概率模型

获取原文
获取原文并翻译 | 示例

摘要

The need for sequencing DNA has been growing tremendously over the past few years. Current next-generation sequencing techniques produce huge amounts of data but time and money remain limiting factors for researchers. Given a DNA sample, it is essential to produce a sufficient number of reads to create or recreate a digital representation of the DNA while minimizing the needed resources. This work proposes a theoretical model that yields a set of formulas to calculate amongst others the expected distribution of contig length and estimated N50 value for a low-coverage, short-read sequencing experiment. The formulas can be used as an extension to the well known Lander-Waterman model to model assembly projects. The only input parameters these formulas are based on are the DNA sequence length, the number of reads and the read length. These formulas can provide boundaries (e.g. N50) that can be calculated before a sequencing process in order to reduce or adjust the needed resources for resequencing or de novo assembly and to get enough, but not too much, information or estimate the feasibility of a sequencing project.
机译:在过去的几年中,对DNA测序的需求已大大增加。当前的下一代测序技术可产生大量数据,但时间和金钱仍然是研究人员的限制因素。对于给定的DNA样本,必须产生足够数量的读取来创建或重新创建DNA的数字表示,同时将所需的资源降至最低,这一点至关重要。这项工作提出了一个理论模型,该模型产生了一组公式,用于计算低覆盖率,短读测序实验的重叠群长度的预期分布和估计的N50值。这些公式可以用作众所周知的Lander-Waterman模型的扩展,以对装配项目进行建模。这些公式所基于的唯一输入参数是DNA序列长度,读取次数和读取长度。这些公式可以提供可以在测序过程之前计算的边界(例如N50),以减少或调整重新测序或从头组装所需的资源,并获得足够但不太多的信息或估计测序的可行性项目。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号