首页> 外文期刊>BMC Bioinformatics >Optimizing graph-based patterns to extract biomedical events from the literature
【24h】

Optimizing graph-based patterns to extract biomedical events from the literature

机译:优化基于图的模式以从文献中提取生物医学事件

获取原文
           

摘要

In BioNLP-ST 2013 We participated in the BioNLP 2013 shared tasks on event extraction. Our extraction method is based on the search for an approximate subgraph isomorphism between key context dependencies of events and graphs of input sentences. Our system was able to address both the GENIA (GE) task focusing on 13 molecular biology related event types and the Cancer Genetics (CG) task targeting a challenging group of 40 cancer biology related event types with varying arguments concerning 18 kinds of biological entities. In addition to adapting our system to the two tasks, we also attempted to integrate semantics into the graph matching scheme using a distributional similarity model for more events, and evaluated the event extraction impact of using paths of all possible lengths as key context dependencies beyond using only the shortest paths in our system. We achieved a 46.38% F-score in the CG task (ranking 3 rd ) and a 48.93% F-score in the GE task (ranking 4 th ). After BioNLP-ST 2013 We explored three ways to further extend our event extraction system in our previously published work: (1) We allow non-essential nodes to be skipped, and incorporated a node skipping penalty into the subgraph distance function of our approximate subgraph matching algorithm. (2) Instead of assigning a unified subgraph distance threshold to all patterns of an event type, we learned a customized threshold for each pattern. (3) We implemented the well-known Empirical Risk Minimization (ERM) principle to optimize the event pattern set by balancing prediction errors on training data against regularization. When evaluated on the official GE task test data, these extensions help to improve the extraction precision from 62% to 65%. However, the overall F-score stays equivalent to the previous performance due to a 1% drop in recall.
机译:在BioNLP-ST 2013中,我们参加了有关事件提取的BioNLP 2013共享任务。我们的提取方法是基于在事件的关键上下文依存关系和输入语句的图之间搜索近似子图同构的。我们的系统既可以处理针对13种分子生物学相关事件类型的GENIA(GE)任务,也可以针对具有挑战性的40种癌症生物学相关事件类型的组进行癌症遗传学(CG)任务,其中涉及18种生物实体的论据各不相同。除了使我们的系统适应这两项任务外,我们还尝试使用分布相似性模型将语义集成到图匹配方案中以处理更多事件,并评估了使用所有可能长度的路径作为关键上下文依赖项(而不是使用)的事件提取影响。只有我们系统中最短的路径。我们在CG任务中获得了46.38%的F评分(排名3 rd ),在GE任务中获得了48.93%的F评分(排名4 )。在BioNLP-ST 2013之后,我们在先前发表的工作中探索了三种方法来进一步扩展事件提取系统:(1)我们允​​许跳过非必要节点,并将节点跳过惩罚纳入近似子图的子图距离函数中匹配算法。 (2)我们没有为事件类型的所有模式分配统一的子图距离阈值,而是为每种模式学习了自定义阈值。 (3)我们实施了众所周知的经验风险最小化(ERM)原理,通过平衡训练数据的预测误差和正则化来优化事件模式集。在正式的GE任务测试数据上进行评估时,这些扩展有助于将提取精度从62%提高到65%。但是,由于召回率下降了1%,总体F得分保持与以前的表现相当。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号