首页> 美国卫生研究院文献>Journal of Computational Biology >Joker de Bruijn: Covering k-Mers Using Joker Characters
【2h】

Joker de Bruijn: Covering k-Mers Using Joker Characters

机译:Joker de Bruijn:使用小丑角色覆盖k-Mers

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Sequence libraries that cover all k-mers enable universal and unbiased measurements of nucleotide and peptide binding. The shortest sequence to cover all k-mers is a de Bruijn sequence of length . Researchers would like to increase k to measure interactions at greater detail, but face a challenging problem: the number of k-mers grows exponentially in k, while the space on the experimental device is limited. In this study, we introduce a novel advance to shrink k-mer library sizes by using joker characters, which represent all characters in the alphabet. Theoretically, the use of joker characters can reduce the library size tremendously, but it should be limited as the introduced degeneracy lowers the statistical robustness of measurements. In this work, we consider the problem of generating a minimum-length sequence that covers a given set of k-mers using joker characters. The number and positions of the joker characters are provided as input. We first prove that the problem is NP-hard. We then present the first solution to the problem, which is based on two algorithmic innovations: (1) a greedy heuristic and (2) an integer linear programming (ILP) formulation. We first run the heuristic to find a good feasible solution, and then run an ILP solver to improve it. We ran our algorithm on DNA and amino acid alphabets to cover all k-mers for different values of k and k-mer multiplicity. Results demonstrate that it produces sequences that are very close to the theoretical lower bound.
机译:>涵盖所有k-mer的序列库可实现核苷酸和肽结合的通用和无偏测量。覆盖所有k-mers的最短序列是一个长度为de Bruijn的序列。研究人员希望增加k来更详细地测量相互作用,但是面临一个具有挑战性的问题:k-mers的数量以k的形式呈指数增长,而实验装置上的空间有限。在这项研究中,我们介绍了一种新颖的进步,通过使用小丑字符来缩小k-mer库的大小,小丑字符代表字母中的所有字符。从理论上讲,使用小丑字符可以极大地减小库的大小,但是由于引入的简并性会降低测量的统计稳健性,因此应加以限制。在这项工作中,我们考虑使用小丑字符生成覆盖给定k-mers集的最小长度序列的问题。提供小丑字符的数量和位置作为输入。我们首先证明问题是NP难的。然后,我们基于两个算法创新提出该问题的第一个解决方案:(1)贪婪启发式算法和(2)整数线性规划(ILP)公式。我们首先运行启发式算法以找到一个可行的解决方案,然后运行ILP求解器对其进行改进。我们在DNA和氨基酸字母上运行了我们的算法,以涵盖k和k-mer多重性不同值的所有k-mer。结果表明,它产生的序列与理论下限非常接近。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号