首页> 外文学位 >A figure-based system for extracting, archiving, and retrieving protein-protein interactions (PPIS) from biomedical literature.
【24h】

A figure-based system for extracting, archiving, and retrieving protein-protein interactions (PPIS) from biomedical literature.

机译:一个基于图的系统,用于从生物医学文献中提取,存档和检索蛋白质-蛋白质相互作用(PPIS)。

获取原文
获取原文并翻译 | 示例

摘要

Proteins are complex biological polymers that are commonly considered the workhorses of cells. They mediate virtually all the cellular functions. Correctly identifying and characterizing Protein-Protein interactions (PPIs) is an important task for thoroughly understanding the molecular mechanisms within cells. Despite great efforts that have been made by life science researchers to identify PPIs through experiments and then document them through publications, there still lacks an effective means for retrieving PPI data from literature: manual curation of documents provides accurate results, but is a slow and tedious task that requires a large amount of effort. In this thesis we present a comprehensive system to automatically extract PPI-related information from biomedical articles by mining both textual and graphical information. Our framework aims to assist life scientist to accurately and efficiently curate relevant PPI information from literature.;We first develop a solution for robustly harvesting figure-caption pairs from biomedical literature. Our approach relies on the idea that the PDF specification of the document layout can be used to identify encoded figures and figure boundaries within the PDF and enforce constraints among figure-regions. This allows us to harvest fragmented figures from the PDF, correctly identify subfigures that belong to the same figure, and identify the captions associated with each figure. Our method simultaneously recovers figures and captions and applies additional filtering process to remove irrelevant figures such as logos, to eliminate text passages that were incorrectly identified as captions, and to re-group subfigures to generate a putative figure.;We then present a robust solution for automatically segmenting each figure into unimodal panels. Our approach analyzes figure captions to estimate the number of panels and then combines it with geometric constrains obtained from the position of the panel labels for conducting robust panel extraction. We further develop a hybrid image-text based classification scheme to automatically identify experimental evidence (methods) of PPIs in each panel. We store all processed results (the raw documents, their figures and captions, PPI methods, etc.) in a relational database and construct a new content-based image retrieval (CBIR) system called ePPI (Experimental PPIs). Life scientists can use cascaded queries in ePPI to easily and effectively retrieve PPI-related figures within each article as well as capture essential facts (e.g., experimental methods) regarding specific pairs of PPIs.
机译:蛋白质是通常被认为是细胞主力的复杂生物聚合物。它们实际上介导了所有细胞功能。正确识别和表征蛋白质-蛋白质相互作用(PPI)是彻底了解细胞内分子机制的重要任务。尽管生命科学研究人员做出了巨大的努力来通过实验识别PPI,然后通过出版物对其进行记录,但是仍然缺乏从文献中检索PPI数据的有效方法:手动管理文件可提供准确的结果,但过程缓慢且繁琐需要大量努力的任务。在本文中,我们提出了一个综合系统,该系统通过挖掘文本和图形信息自动从生物医学文章中提取与PPI相关的信息。我们的框架旨在帮助生命科学家准确有效地从文献中整理相关的PPI信息。;我们首先开发了一种从生物医学文献中稳健地收集图形字幕对的解决方案。我们的方法基于这样的想法,即文档布局的PDF规范可用于识别PDF中的编码图形和图形边界,并在图形区域之间施加约束。这使我们可以从PDF中收集零散的图形,正确地识别属于同一图形的子图形,并标识与每个图形关联的标题。我们的方法同时恢复图形和标题,并应用其他过滤过程以删除不相关的图形(例如徽标),消除被错误地标识为标题的文本段落,并对子图重新分组以生成假定图形。用于将每个图形自动分割为单峰面板。我们的方法分析图形标题以估计面板的数量,然后将其与从面板标签位置获得的几何约束相结合,以进行可靠的面板提取。我们进一步开发了一种基于混合图文的分类方案,以自动识别每个面板中PPI的实验证据(方法)。我们将所有处理后的结果(原始文档,其图形和标题,PPI方法等)存储在关系数据库中,并构建一个称为ePPI(实验性PPI)的基于内容的新图像检索(CBIR)系统。生命科学家可以在ePPI中使用级联查询,以轻松有效地检索每篇文章中与PPI相关的数据,以及捕获有关特定PPI对的基本事实(例如实验方法)。

著录项

  • 作者

    Lopez-Gutierrez, Luis D.;

  • 作者单位

    University of Delaware.;

  • 授予单位 University of Delaware.;
  • 学科 Biology Bioinformatics.;Computer Science.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 130 p.
  • 总页数 130
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号