首页> 外文期刊>Computational Biology and Bioinformatics, IEEE/ACM Transactions on >RLIMS-P 2.0: A Generalizable Rule-Based Information Extraction System for Literature Mining of Protein Phosphorylation Information
【24h】

RLIMS-P 2.0: A Generalizable Rule-Based Information Extraction System for Literature Mining of Protein Phosphorylation Information

机译:RLIMS-P 2.0:用于蛋白质磷酸化信息文献挖掘的可扩展的基于规则的信息提取系统

获取原文
获取原文并翻译 | 示例
           

摘要

We introduce RLIMS-P version 2.0, an enhanced rule-based information extraction (IE) system for mining kinase, substrate, and phosphorylation site information from scientific literature. Consisting of natural language processing and IE modules, the system has integrated several new features, including the capability of processing full-text articles and generalizability towards different post-translational modifications (PTMs). To evaluate the system, sets of abstracts and full-text articles, containing a variety of textual expressions, were annotated. On the abstract corpus, the system achieved F-scores of 0.91, 0.92, and 0.95 for kinases, substrates, and sites, respectively. The corresponding scores on the full-text corpus were 0.88, 0.91, and 0.92. It was additionally evaluated on the corpus of the 2013 BioNLP-ST GE task, and achieved an F-score of 0.87 for the phosphorylation task, improving upon the results previously reported on the corpus. Full-scale processing of all abstracts in MEDLINE and all articles in PubMed Central Open Access Subset has demonstrated scalability for mining rich information in literature, enabling its adoption for biocuration and for knowledge discovery. The new system is generalizable and it will be adapted to tackle other major PTM types. RLIMS-P 2.0 online system is available online (http://proteininformationresource.org/rlimsp/) and the developed corpora are available from iProLINK (http://proteininformationresource.org/iprolink/).
机译:我们引入RLIMS-P 2.0版,这是一个增强的基于规则的信息提取(IE)系统,用于从科学文献中挖掘激酶,底物和磷酸化位点信息。该系统由自然语言处理和IE模块组成,集成了几个新功能,包括处理全文文章的功能和针对不同翻译后修饰(PTM)的通用性。为了评估该系统,对包含各种文本表达的摘要和全文文章进行了注释。在抽象语料库上,该系统的激酶,底物和位点的F值分别为0.91、0.92和0.95。全文语料库的相应分数分别为0.88、0.91和0.92。此外,还对2013 BioNLP-ST GE任务的语料库进行了评估,磷酸化任务的F值达到0.87,与之前报道的结果相比有所改善。 MEDLINE中所有摘要的全部处理以及PubMed Central Open Access Subset中的所有文章的全面处理都证明了可伸缩性,可用于挖掘文献中的丰富信息,从而使其可用于生物固化和知识发现。新系统具有通用性,将适用于其他主要的PTM类型。 RLIMS-P 2.0在线系统可在线获得(http://proteininformationresource.org/rlimsp/),开发的语料库可从iProLINK(http://proteininformationresource.org/iprolink/)获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号