PREDICTION OF cis-REGULATORY ELEMENTS: FROM HIGH-INFORMATION CONTENT ANALYSIS TO MOTIF IDENTIFICATION

GUOJUN LI; JIZHU LU; VICTOR OLMAN; YING XU

首页> 外文期刊>Journal of Bioinformatics and Computational Biology >PREDICTION OF cis-REGULATORY ELEMENTS: FROM HIGH-INFORMATION CONTENT ANALYSIS TO MOTIF IDENTIFICATION

【24h】

PREDICTION OF cis-REGULATORY ELEMENTS: FROM HIGH-INFORMATION CONTENT ANALYSIS TO MOTIF IDENTIFICATION

机译：顺式调控元素的预测：从高信息含量分析到分子鉴定

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

One popular approach to prediction of binding motifs of transcription factors is to model the problem as to search for a group of l-mers (motifs), for some l > 0, one from each of the provided promoter regions of a group of co-expressed genes, that exhibit high information content when aligned without gaps. In our current work, we assume that these desired l-mers have evolved from a common ancestor, each of which has mutations in at most k-positions from the common ancestor, where k is substantially smaller than l. This implies that these l-mers should belong to the k-neighborhood of their common ancestor, measured in terms of Hamming distance. If the ancestor is given, then the problem for finding these l-mers becomes trivial. Unfortunately, the problem of identifying the unknown ancestor is probably as hard as the problem of predicting the motifs themselves. Our goal is to identify a set of l-mers that slightly violate the k-neighborhood of a putative ancestor, but capture all the desiredmotifs, which will lead to an efficient way for identification of the desired motifs. The main contributions of this paper are in four aspects: (a) we have derived nontrivial lower and upper bounds of information content for a set of l-mers that differ from an unknown ancestor in no more than k positions; (b) we have defined a new distance between two sequences and a k-pseudo-neighborhood, based on the new distance, that contains the k-neighborhood, defined by Hamming distance, of the to-be-defined ancestor; (c) we have developed an algorithm to minimize the sum of all the distances between a predicted ancestor motif and a group of l-mers from the provided promoter regions, using the new distance; and (d) we have tested PROMOCO and compared its prediction results performance with two other prediction programs. The algorithm, implemented as a computer software program PROMOCO, has been used to find all conserved motifs in a set of provided promoter sequences. Our preliminary application of PROMOCO shows that it achieves better or comparable prediction results, when compared to popular programs for identification of cis regulatory binding motifs. A limitation of the algorithm is that it does not work well when the size of the set of provided promoter sequences is too small or when desired motifs appear in only small portion of the given sequences.

机译：预测转录因子结合基序的一种流行方法是对问题进行建模，以寻找一组l-mers（基序），其中l> 0，从一组co-co的每个提供的启动子区域中选择一个表达的基因，在没有缺口的情况下显示出很高的信息含量。在我们目前的工作中，我们假设这些所需的I-mer是从一个共同祖先进化而来的，每个祖先在共同祖先的最多k个位置都有突变，其中k显着小于l。这意味着这些l聚体应该属于它们的共同祖先的k邻域，以汉明距离来衡量。如果给出祖先，那么找到这些l-mer的问题就变得微不足道了。不幸的是，识别未知祖先的问题可能与预测图案本身的问题一样困难。我们的目标是确定一组l-mer，这些l-mer稍微违反了一个假定祖先的k邻域，但捕获了所有所需的基序，这将导致一种鉴定所需基序的有效方法。本文的主要贡献是在四个方面：（a）我们得出了不超过k个位置的与未知祖先不同的一组l-mer的信息内容的上下限。（b）我们根据新距离定义了两个序列之间的新距离和一个k伪邻居，其中包含要定义的祖先的k邻居，由汉明距离定义; （c）我们已经开发了一种算法，以使用新的距离最小化预测的祖先基序与来自提供的启动子区域的一组l聚体之间的所有距离之和; （d）我们已经测试了PROMOCO，并将其预测结果与其他两个预测程序进行了比较。该算法以计算机软件程序PROMOCO的形式实现，已用于在一组提供的启动子序列中找到所有保守基序。我们对PROMOCO的初步应用表明，与流行的用于识别顺式调控结合基序的程序相比，它可以实现更好或相当的预测结果。该算法的局限性在于，当所提供的启动子序列组的大小太小或所需的基序仅出现在给定序列的一小部分时，该算法将无法正常工作。

著录项

来源
《Journal of Bioinformatics and Computational Biology》 |2007年第4期|共22页
作者
GUOJUN LI; JIZHU LU; VICTOR OLMAN; YING XU;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类细胞生物学;
关键词
cis-regulatory elements; information content; pseudo-Hamming distance;

机译：顺式调控元件信息量伪汉明距离;

相似文献

外文文献
中文文献
专利

1. PREDICTION OF cis-REGULATORY ELEMENTS: FROM HIGH-INFORMATION CONTENT ANALYSIS TO MOTIF IDENTIFICATION [J] . GUOJUN LI, JIZHU LU, VICTOR OLMAN, Journal of Bioinformatics and Computational Biology . 2007,第4期

机译：顺式调控元素的预测：从高信息含量分析到分子鉴定
2. Identification of cis-regulatory motifs in first introns and the prediction of intron-mediated enhancement of gene expression in Arabidopsis thaliana [J] . Back Georg, Walther Dirk BMC Genomics . 2021,第1期

机译：鉴定拟南芥基因表达内含子介导的CIS-incormatoration基序及其在拟南芥中基因表达的预测
3. An integrated toolkit for accurate prediction and analysis of cis-regulatory motifs at a genome scale [J] . Ma Qin, Liu Bingqiang, Zhou Chuan, Bioinformatics . 2013,第18期

机译：用于在基因组规模上准确预测和分析顺式调控基序的集成工具包
4. PROMOCO: a new program for prediction of cis regulatory elements: from high-information content analysis to clique identification [C] . Guojun Li, Jizhu Lu, Olman, . 2005

机译：PROMOCO：一个预测顺式调节成分的新程序：从高信息含量分析到集团鉴定
5. Computational Identification of cis-Regulatory Elements and Prediction of Gene Expression Level [D] . Sheng, Huitao 2011

机译：顺式调控元件的计算鉴定和基因表达水平的预测
6. Identification of cis-regulatory motifs in first introns and the prediction of intron-mediated enhancement of gene expression in [O] . Georg Back, Dirk Walther 2021

机译：在第一次内含子和内介内介导的基因表达的预测中的鉴定
7. Identification of cis-regulatory motifs in first introns and the prediction of intron-mediated enhancement of gene expression in Arabidopsis thaliana [O] . Georg Back, Dirk Walther 2021

机译：鉴定拟南芥基因表达内含内内含子的CIS-incumatoration基序及其在拟南芥中基因表达的预测

PREDICTION OF cis-REGULATORY ELEMENTS: FROM HIGH-INFORMATION CONTENT ANALYSIS TO MOTIF IDENTIFICATION

摘要

著录项

相似文献

相关主题

期刊订阅