【24h】

Improved Recombination Lower Bounds for Haplotype Data

机译:改良的单倍型数据重组下界

获取原文
获取原文并翻译 | 示例

摘要

Recombination is an important evolutionary mechanism responsible for the genetic diversity in humans and other organisms. Recently, there has been extensive research on understanding the fine scale variation in recombination rates across the human genome using DNA polymorphism data. A combinatorial approach toward this is to estimate the minimum number of recombination events in any history of the sample. Recently, Myers and Griffiths proposed two measures, R_h and R_s, that give lower bounds on the minimum number of recombination events. In this paper, we provide new and improved methods (both in terms of running time and ability to detect past recombination events) for computing recombination lower bounds. Our principal results include: 1. We show that computing the lower bound Rh is NP-hard and adapt the greedy algorithm for the set cover problem to obtain a polynomial time algorithm for computing a diversity based bound R_g. This algorithm is several orders of magnitude faster than the Recmin program and the bound R_g matches the bound R_h almost always. 2. We also show that computing the lower bound R_s is also NP-hard using a reduction from MAX-2SAT. We give a O(m2~n) time algorithm for computing R_s for a dataset with n haplotypes and m SNP's. We propose a new bound R_I which extends the history based bound R_s using the notion of intermediate haplotypes. This bound detects more recombination events than both R_h and R_s bounds on many real datasets. 3. We extend our algorithms for computing R_g and R_s to obtain lower bounds for haplotypes with missing data. These methods can detect more recombination events for the LPL dataset than previous bounds and provide stronger evidence for the presence of a recombination hotspot. 4. We apply our lower bounds to a real dataset and demonstrate that these can provide a good indication for the presence and the location of recombination hotspots.
机译:重组是负责人类和其他生物体遗传多样性的重要进化机制。最近,已经进行了广泛的研究,以利用DNA多态性数据了解整个人类基因组中重组率的精细变化。对此的组合方法是估计任何样本历史中的最小重组事件数。最近,迈尔斯(Myers)和格里菲斯(Griffiths)提出了两个指标R_h和R_s,它们为重组事件的最小数量提供了下限。在本文中,我们提供了新的和改进的方法(在运行时间和检测过去重组事件的能力方面)来计算重组下限。我们的主要结果包括:1.我们证明计算下限Rh是NP难的,并且将贪婪算法用于集合覆盖问题以获得用于计算基于分集的边界R_g的多项式时间算法。该算法比Recmin程序快几个数量级,并且绑定的R_g几乎总是与绑定的R_h匹配。 2.我们还表明,使用MAX-2SAT的简化来计算下限R_s也是NP-难的。我们给出了一个O(m2〜n)时间算法,用于计算具有n个单倍型和m个SNP的数据集的R_s。我们提出了一个新的边界R_I,它使用中间单元型的概念扩展了基于历史的边界R_s。与许多实际数据集上的R_h和R_s界限相比,此界限检测的重组事件更多。 3.我们扩展了计算R_g和R_s的算法,以获得缺少数据的单倍型的下界。这些方法可以为LPL数据集检测到比先前范围更多的重组事件,并为存在重组热点提供了更有力的证据。 4.我们将下界应用于实际数据集,并证明这些下界可以为重组热点的存在和位置提供良好的指示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号