...
首页> 外文期刊>INFORMS journal on computing >Computational Problems in Noisy SNP and Haplotype Analysis: Block Scores, Block Identification, and Population Stratification
【24h】

Computational Problems in Noisy SNP and Haplotype Analysis: Block Scores, Block Identification, and Population Stratification

机译:嘈杂的SNP和单倍型分析中的计算问题:区块得分,区块识别和人口分层

获取原文
获取原文并翻译 | 示例
           

摘要

The study of haplotypes and their diversity in a population is central to disease-association research. We study several problems arising in haplotype block partitioning. Our objective function is the total number of distinct haplotypes in blocks. We show that the problem is NP-hard when there are errors or missing data, and provide approximation algorithms for several of its variants. We also give an algorithm that solves the problem with high probability under a probabilistic model that allows noise and missing data. In addition, we study the multipopulation case, where one has to partition the haplotypes into populations and seek a different block partition in each one. We provide a heuristic for that problem and use it to analyze simulated and real data. On simulated data, our blocks resemble the true partition more than the blocks generated by the LD-based algorithm of Gabriel et al (2002). On single-population real data, we generate a more concise block description than do extant approaches, with better average LD within blocks. The algorithm also gives promising results on real two-population genotype data.
机译:单倍型及其在人群中的多样性的研究是疾病关联研究的核心。我们研究了单体型块分区中出现的几个问题。我们的目标函数是块中不同单倍型的总数。我们证明,当存在错误或数据丢失时,问题是NP难的,并为其几种变量提供了近似算法。我们还给出了一种算法,该算法在允许噪声和数据丢失的概率模型下以高概率解决问题。此外,我们研究了多种群的情况,在这种情况下,必须将单倍型划分为多个种群,并在每个种群中寻求不同的块分区。我们提供了针对该问题的启发式方法,并使用它来分析模拟和真实数据。在模拟数据上,我们的块比由Gabriel等(2002)的基于LD的算法生成的块更像真实分区。在单人口真实数据上,与现有方法相比,我们生成了更简洁的块描述,并且块内的平均LD更好。该算法在真实的两人基因型数据上也给出了有希望的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号