首页> 外文会议>Asia-Pacific ioinformatics Conference >Fast motif recognition via application of statistical thresholds
【24h】

Fast motif recognition via application of statistical thresholds

机译:通过应用统计阈值的快速主题识别

获取原文

摘要

Background: Improving the accuracy and efficiency of motif recognition is an important computational challenge that has application to detecting transcription factor binding sites in genomic data. Closely related to motif recognition is the ConsensusString decision problem that asks, given a parameter d and a set of length strings S = {si, s), whether there exists a consensus string that has Hamming distance at most d from any string in S. A set of strings S is pairwise bounded if the Hamming distance between any pair of strings in S is at most 2d. It is trivial to determine whether a set is pairwise bounded, and a set cannot have a consensus string unless it is pairwise bounded. We use Consensus String to determine whether or not a pairwise bounded set has a consensus. Unfortunately, Consensus String is NP-complete. The lack of an efficient method to solve the Consensus String problem has caused it to become a computational bottleneck in MCL-WMR, a motif recognition program capable of solving difficult motif recognition problem instances. Results: We focus on the development of a method for solving Consensus String quickly with a small probability of error. We apply this heuristic to develop a new motif recognition program, sMCL-WMR, which has impressive accuracy and efficiency. We demonstrate the performance of sMCL-WMR in detecting weak motifs in large data sets and in real genomic data sets, and compare the performance to other leading motif recognition programs. In our preliminary discussion of our Consensus String algorithm we give insight into the issue of sampling pairwise bounded sets, and discuss its relevance to motif recognition. Conclusion: Our novel heuristic gives birth to a state of the art program, sMCL-WMR, that is capable ofdetecting weak motifs in data sets with a large number of strings. sMCL-WMR is orders of magnitude faster than its predecessor MCL-WMR and is capable of solving previously unsolved synthetic motif recognition problems. Lastly, sMCL-WMR shows impressive accuracy in detecting transcription factor binding sites in the genomic data and used in the assessment of Tompa et al.
机译:背景技术:提高图案识别的准确性和效率是一种重要的计算挑战,其具有在基因组数据中检测转录因子结合位点的应用。与图案识别密切相关的是询问的共同决策问题,询问参数d和一组长度字符串s = {si,s),是否存在具有最多d的汉明距离的共识串,从S中的任何字符串都具有汉敏距离。如果在S中的任何一对串之间的汉明距离最多2D,则一组串秒是界定的。确定一个设置是否成对界面是琐碎的,并且一组不能具有共识串,除非它是对界面的。我们使用共识字符串来确定一对配对界限集是否具有共识。不幸的是,共识字符串是NP-Cleante。缺乏解决共识串问题的有效方法导致它成为MCL-WMR中的计算瓶颈,这是一种能够解决困难的主题识别问题实例的主题识别程序。结果:我们专注于快速解决共识串的方法的开发,误差概率很小。我们应用这一启发式旨在开发一个新的主题识别计划,SMCL-WMR,具有令人印象深刻的准确性和效率。我们展示了SMCL-WMR在大数据集中检测弱图案中的性能以及实际基因组数据集,并将性能与其他主要的主题识别程序进行比较。在我们初步讨论我们共识的字符串算法中,我们深入了解采样成对有界集的问题,并讨论其与主题识别的相关性。结论:我们的小说启发式赋予了现有技术,SMCL-WMR的状态,它能够在具有大量字符串的数据集中进行弱图案。 SMCL-WMR比其前任MCL-WMR快,并且能够解决先前未解决的合成基序问题。最后,SMCL-WMR在检测基因组数据中检测转录因子结合位点并用于评估Tompa等人的令人印象深刻的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号