Fast motif recognition via application of statistical thresholds

机译：通过应用统计阈值的快速主题识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background: Improving the accuracy and efficiency of motif recognition is an important computational challenge that has application to detecting transcription factor binding sites in genomic data. Closely related to motif recognition is the ConsensusString decision problem that asks, given a parameter d and a set of length strings S = {si, s), whether there exists a consensus string that has Hamming distance at most d from any string in S. A set of strings S is pairwise bounded if the Hamming distance between any pair of strings in S is at most 2d. It is trivial to determine whether a set is pairwise bounded, and a set cannot have a consensus string unless it is pairwise bounded. We use Consensus String to determine whether or not a pairwise bounded set has a consensus. Unfortunately, Consensus String is NP-complete. The lack of an efficient method to solve the Consensus String problem has caused it to become a computational bottleneck in MCL-WMR, a motif recognition program capable of solving difficult motif recognition problem instances. Results: We focus on the development of a method for solving Consensus String quickly with a small probability of error. We apply this heuristic to develop a new motif recognition program, sMCL-WMR, which has impressive accuracy and efficiency. We demonstrate the performance of sMCL-WMR in detecting weak motifs in large data sets and in real genomic data sets, and compare the performance to other leading motif recognition programs. In our preliminary discussion of our Consensus String algorithm we give insight into the issue of sampling pairwise bounded sets, and discuss its relevance to motif recognition. Conclusion: Our novel heuristic gives birth to a state of the art program, sMCL-WMR, that is capable ofdetecting weak motifs in data sets with a large number of strings. sMCL-WMR is orders of magnitude faster than its predecessor MCL-WMR and is capable of solving previously unsolved synthetic motif recognition problems. Lastly, sMCL-WMR shows impressive accuracy in detecting transcription factor binding sites in the genomic data and used in the assessment of Tompa et al.

机译：背景技术：提高图案识别的准确性和效率是一种重要的计算挑战，其具有在基因组数据中检测转录因子结合位点的应用。与图案识别密切相关的是询问的共同决策问题，询问参数d和一组长度字符串s = {si，s），是否存在具有最多d的汉明距离的共识串，从S中的任何字符串都具有汉敏距离。如果在S中的任何一对串之间的汉明距离最多2D，则一组串秒是界定的。确定一个设置是否成对界面是琐碎的，并且一组不能具有共识串，除非它是对界面的。我们使用共识字符串来确定一对配对界限集是否具有共识。不幸的是，共识字符串是NP-Cleante。缺乏解决共识串问题的有效方法导致它成为MCL-WMR中的计算瓶颈，这是一种能够解决困难的主题识别问题实例的主题识别程序。结果：我们专注于快速解决共识串的方法的开发，误差概率很小。我们应用这一启发式旨在开发一个新的主题识别计划，SMCL-WMR，具有令人印象深刻的准确性和效率。我们展示了SMCL-WMR在大数据集中检测弱图案中的性能以及实际基因组数据集，并将性能与其他主要的主题识别程序进行比较。在我们初步讨论我们共识的字符串算法中，我们深入了解采样成对有界集的问题，并讨论其与主题识别的相关性。结论：我们的小说启发式赋予了现有技术，SMCL-WMR的状态，它能够在具有大量字符串的数据集中进行弱图案。 SMCL-WMR比其前任MCL-WMR快，并且能够解决先前未解决的合成基序问题。最后，SMCL-WMR在检测基因组数据中检测转录因子结合位点并用于评估Tompa等人的令人印象深刻的准确性。

著录项

来源
《Asia-Pacific ioinformatics Conference》|2011年||共8页
会议地点
作者
Christina Boucher; James King;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 Q811.4-532;
关键词

相似文献

外文文献
中文文献
专利

1. Fast motif recognition via application of statistical thresholds [J] . Christina Boucher, James King BMC Bioinformatics . 2010,第SUPPLEMENTa1期

机译：通过应用统计阈值快速识别主题
2. FASTA-ELM: A fast adaptive shrinkage/thresholding algorithm for extreme learning machine and its application to gender recognition [J] . Mahmood Saif F., Marhaban Mohammad Hamiruce, Rokhani Fakhrul Zaman, Neurocomputing . 2017,第JANa5期

机译：FASTA-ELM：一种用于极端学习机的快速自适应收缩/阈值算法及其在性别识别中的应用
3. Length of the hypermutation motif DGYW/WRCH in the focus of statistical limits. Implications for a double-motif or extended motif recognition models. [J] . Kubrycht J, Sigler K Journal of Theoretical Biology . 2008,第1期

机译：超突变基序DGYW / WRCH的长度在统计范围内。对双主题或扩展主题识别模型的影响。
4. Fast motif recognition via application of statistical thresholds [C] . Christina Boucher, James King Asia-Pacific ioinformatics Conference . 2011

机译：通过应用统计阈值的快速主题识别
5. Fast algorithms for computing statistics under interval uncertainty, with applications to computer science and to electrical and computer engineering [D] . Xiang, Gang 2007

机译：在区间不确定性下计算统计信息的快速算法，应用于计算机科学以及电气和计算机工程
6. Fast motif recognition via application of statistical thresholds [O] . Christina Boucher, James King 2010

机译：通过统计阈值的快速识别主题
7. Fast motif recognition via application of statistical thresholds [O] . Christina Boucher, James King 2010

机译：通过统计阈值的快速识别主题
8. Normalized Texture Motifs and Their Application to Statistical Object Modeling [R] . Newsam, S. D. 2004

机译：归一化纹理图案及其在统计对象建模中的应用

Fast motif recognition via application of statistical thresholds

摘要

著录项

相似文献

相关主题

期刊订阅