首页> 外文学位 >Sequence-based prediction of RNA-protein interactions.
【24h】

Sequence-based prediction of RNA-protein interactions.

机译:RNA-蛋白质相互作用的基于序列的预测。

获取原文
获取原文并翻译 | 示例

摘要

The interaction of RNAs with proteins is fundamental for executing many of the key roles they play in living systems, including translation, post-transcriptional regulation of gene expression, RNA splicing, and viral replication. Recently, new roles for RNA-protein interactions have emerged, following the discovery that the human genome is pervasively transcribed and produces thousands of non-coding RNAs (ncRNAs). Although the functions of many ncRNAs are not yet known, one emerging theme is that long non-coding RNAs (lncRNAs) often drive the formation of ribonucleoprotein (RNP) complexes, which in turn influence the regulation of gene expression. Although the human genome is predicted to encode almost as many different RNA-binding proteins as DNA-binding transcription factors, our current understanding of the cellular roles of RNA-binding proteins, how they recognize their targets, and how they are regulated, lags far behind our understanding of transcription factors.;To improve our comprehension of RNA-protein recognition and the regulation of RNA-protein interaction networks within cells, this dissertation has four related goals: (i) performing a rigorous and systematic evaluation of sequence- and structure-based methods for predicting RNA-binding residues in proteins; (ii) developing improved method for predicting interfacial residues in RNA-binding proteins, using only sequence information; (iii) generating a comprehensive collection of RNA-protein interaction motifs (RPIMs); and (iv) developing improved methods for RNA-protein interaction partner prediction.;First, we present a systematic evaluation of state-of-the-art machine learning methods for predicting RNA-binding residues in proteins, using three carefully curated benchmark datasets and a rich set of data representations. We show that sequence-based methods trained using position-specific scoring matrices (PSSMs) perform better than structure-based methods, which use more complex features extracted from the 3D structures of proteins. Second, we present RNABindRPlus, a new method for predicting RNA-binding residues in proteins, using only sequence information. The predictor combines output from an optimized Support Vector Machine (SVM) classifier with the output from a novel homology-based method (HomPRIP). We show that RNABindRPlus performs better than all currently available methods for predicting interfacial residues in proteins. Third, we extract more than 30,000 unique RNA-protein interfacial motifs (RPIMs), consisting of contiguous residues from both the RNA and protein chains of characterized RNA-protein complexes. Lastly, we demonstrate the utility of RPIMs in predicting RNA-protein interaction partners. We employ them in an innovative and significantly improved method for partner prediction and show that it has both a high true positive rate and a much lower false positive rate than other available methods. Taken together, the results presented here provide important new insights into the determinants of RNA-protein recognition, in addition to valuable new software tools for interrogating and predicting RNA-protein complexes and interaction networks.
机译:RNA与蛋白质的相互作用对于执行它们在生命系统中发挥的许多关键作用至关重要,包括翻译,基因表达的转录后调控,RNA剪接和病毒复制。最近,随着发现人类基因组被普遍转录并产生数千个非编码RNA(ncRNA),RNA-蛋白质相互作用的新作用已经出现。尽管许多ncRNA的功能尚不清楚,但一个新兴的主题是长的非编码RNA(lncRNA)通常会驱动核糖核蛋白(RNP)复合物的形成,进而影响基因表达的调节。尽管预测人类基因组编码与DNA结合转录因子几乎一样多的不同的RNA结合蛋白,但我们对RNA结合蛋白的细胞作用,它们如何识别其靶标以及如何调节的认识仍然远远落后为了提高我们对RNA蛋白质识别和细胞内RNA蛋白质相互作用网络的调控的理解,本文有四个相关目标:(i)对序列和结构进行严格而系统的评估预测蛋白质中RNA结合残基的基于方法; (ii)开发仅使用序列信息来预测RNA结合蛋白中界面残基的改进方法; (iii)全面收集RNA-蛋白质相互作用基序(RPIM);首先,我们使用三个精心挑选的基准数据集,对用于预测蛋白质中RNA结合残基的最新机器学习方法进行系统评估。丰富的数据表示形式。我们显示,使用基于位置的特定评分矩阵(PSSM)训练的基于序列的方法比基于结构的方法表现更好,后者使用从蛋白质3D结构中提取的更复杂的特征。其次,我们介绍RNABindRPlus,这是一种仅使用序列信息即可预测蛋白质中RNA结合残基的新方法。预测器将优化的支持向量机(SVM)分类器的输出与新型基于同源性的方法(HomPRIP)的输出进行组合。我们表明,RNABindRPlus的性能优于目前所有可预测蛋白质中界面残基的方法。第三,我们提取了30,000多个独特的RNA-蛋白质界面基序(RPIM),其中包括来自特征性RNA-蛋白质复合物的RNA和蛋白质链的连续残基。最后,我们证明了RPIM在预测RNA-蛋白质相互作用伙伴方面的效用。我们将其用于创新且经过显着改进的合作伙伴预测方法中,结果表明,与其他可用方法相比,该方法具有很高的真实阳性率和非常低的错误阳性率。综上所述,这里提出的结果,除了提供用于询问和预测RNA蛋白质复合物和相互作用网络的有价值的新软件工具之外,还为了解RNA蛋白质识别的决定因素提供了重要的新见解。

著录项

  • 作者

    Walia, Rasna Rani.;

  • 作者单位

    Iowa State University.;

  • 授予单位 Iowa State University.;
  • 学科 Bioinformatics.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 197 p.
  • 总页数 197
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号