首页> 外文期刊>Journal of Bioinformatics and Computational Biology >PREDICTION OF cis-REGULATORY ELEMENTS: FROM HIGH-INFORMATION CONTENT ANALYSIS TO MOTIF IDENTIFICATION
【24h】

PREDICTION OF cis-REGULATORY ELEMENTS: FROM HIGH-INFORMATION CONTENT ANALYSIS TO MOTIF IDENTIFICATION

机译:顺式调控元素的预测:从高信息含量分析到分子鉴定

获取原文
获取原文并翻译 | 示例
           

摘要

One popular approach to prediction of binding motifs of transcription factors is to model the problem as to search for a group of l-mers (motifs), for some l > 0, one from each of the provided promoter regions of a group of co-expressed genes, that exhibit high information content when aligned without gaps. In our current work, we assume that these desired l-mers have evolved from a common ancestor, each of which has mutations in at most k-positions from the common ancestor, where k is substantially smaller than l. This implies that these l-mers should belong to the k-neighborhood of their common ancestor, measured in terms of Hamming distance. If the ancestor is given, then the problem for finding these l-mers becomes trivial. Unfortunately, the problem of identifying the unknown ancestor is probably as hard as the problem of predicting the motifs themselves. Our goal is to identify a set of l-mers that slightly violate the k-neighborhood of a putative ancestor, but capture all the desiredmotifs, which will lead to an efficient way for identification of the desired motifs. The main contributions of this paper are in four aspects: (a) we have derived nontrivial lower and upper bounds of information content for a set of l-mers that differ from an unknown ancestor in no more than k positions; (b) we have defined a new distance between two sequences and a k-pseudo-neighborhood, based on the new distance, that contains the k-neighborhood, defined by Hamming distance, of the to-be-defined ancestor; (c) we have developed an algorithm to minimize the sum of all the distances between a predicted ancestor motif and a group of l-mers from the provided promoter regions, using the new distance; and (d) we have tested PROMOCO and compared its prediction results performance with two other prediction programs. The algorithm, implemented as a computer software program PROMOCO, has been used to find all conserved motifs in a set of provided promoter sequences. Our preliminary application of PROMOCO shows that it achieves better or comparable prediction results, when compared to popular programs for identification of cis regulatory binding motifs. A limitation of the algorithm is that it does not work well when the size of the set of provided promoter sequences is too small or when desired motifs appear in only small portion of the given sequences.
机译:预测转录因子结合基序的一种流行方法是对问题进行建模,以寻找一组l-mers(基序),其中l> 0,从一组co-co的每个提供的启动子区域中选择一个表达的基因,在没有缺口的情况下显示出很高的信息含量。在我们目前的工作中,我们假设这些所需的I-mer是从一个共同祖先进化而来的,每个祖先在共同祖先的最多k个位置都有突变,其中k显着小于l。这意味着这些l聚体应该属于它们的共同祖先的k邻域,以汉明距离来衡量。如果给出祖先,那么找到这些l-mer的问题就变得微不足道了。不幸的是,识别未知祖先的问题可能与预测图案本身的问题一样困难。我们的目标是确定一组l-mer,这些l-mer稍微违反了一个假定祖先的k邻域,但捕获了所有所需的基序,这将导致一种鉴定所需基序的有效方法。本文的主要贡献是在四个方面:(a)我们得出了不超过k个位置的与未知祖先不同的一组l-mer的信息内容的上下限。 (b)我们根据新距离定义了两个序列之间的新距离和一个k伪邻居,其中包含要定义的祖先的k邻居,由汉明距离定义; (c)我们已经开发了一种算法,以使用新的距离最小化预测的祖先基序与来自提供的启动子区域的一组l聚体之间的所有距离之和; (d)我们已经测试了PROMOCO,并将其预测结果与其他两个预测程序进行了比较。该算法以计算机软件程序PROMOCO的形式实现,已用于在一组提供的启动子序列中找到所有保守基序。我们对PROMOCO的初步应用表明,与流行的用于识别顺式调控结合基序的程序相比,它可以实现更好或相当的预测结果。该算法的局限性在于,当所提供的启动子序列组的大小太小或所需的基序仅出现在给定序列的一小部分时,该算法将无法正常工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号