...
首页> 外文期刊>BMC Bioinformatics >The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities
【24h】

The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities

机译:共参考分辨率对细菌和生物群落实体之间监督关系检测的贡献

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background The acquisition of knowledge about relations between bacteria and their locations (habitats and geographical locations) in short texts about bacteria, as defined in the BioNLP-ST 2013 Bacteria Biotope task, depends on the detection of co-reference links between mentions of entities of each of these three types. To our knowledge, no participant in this task has investigated this aspect of the situation. The present work specifically addresses issues raised by this situation: (i) how to detect these co-reference links and associated co-reference chains; (ii) how to use them to prepare positive and negative examples to train a supervised system for the detection of relations between entity mentions; (iii) what context around which entity mentions contributes to relation detection when co-reference chains are provided. Results We present experiments and results obtained both with gold entity mentions (task 2 of BioNLP-ST 2013) and with automatically detected entity mentions (end-to-end system, in task 3 of BioNLP-ST 2013). Our supervised mention detection system uses a linear chain Conditional Random Fields classifier, and our relation detection system relies on a Logistic Regression (aka Maximum Entropy) classifier. They use a set of morphological, morphosyntactic and semantic features. To minimize false inferences, co-reference resolution applies a set of heuristic rules designed to optimize precision. They take into account the types of the detected entity mentions, and take advantage of the didactic nature of the texts of the corpus, where a large proportion of bacteria naming is fairly explicit (although natural referring expressions such as "the bacteria" are common). The resulting system achieved a 0.495 F-measure on the official test set when taking as input the gold entity mentions, and a 0.351 F-measure when taking as input entity mentions predicted by our CRF system, both of which are above the best BioNLP-ST 2013 participant system. Conclusions We show that co-reference resolution substantially improves over a baseline system which does not use co-reference information: about 3.5 F-measure points on the test corpus for the end-to-end system (5.5 points on the development corpus) and 7 F-measure points on both development and test corpora when gold mentions are used. While this outperforms the best published system on the BioNLP-ST 2013 Bacteria Biotope dataset, we consider that it provides mostly a stronger baseline from which more work can be started. We also emphasize the importance and difficulty of designing a comprehensive gold standard co-reference annotation, which we explain is a key point to further progress on the task.
机译:背景技术根据BioNLP-ST 2013 Bacteria Biotope任务的定义,在有关细菌的简短文本中获取有关细菌及其位置(栖息地和地理位置)之间关系的知识,取决于检测到以下提及的实体之间的共同引用链接:这三种类型中的每一种。据我们所知,没有任何参与者对此情况进行过调查。本工作专门解决了这种情况引起的问题:(i)如何检测这些共同参考链接和相关的共同参考链; (ii)如何利用它们来准备正面和负面的例子,以训练一个监督系统来发现实体提及之间的关系; (iii)提供共同参考链时,实体所提及的上下文背景有助于关系检测。结果我们介绍了通过金牌实体提及(BioNLP-ST 2013的任务2)和自动检测到的实体提及(端对端系统,BioNLP-ST 2013的任务3)获得的实验和结果。我们的监督提及检测系统使用线性链条件随机字段分类器,而我们的关系检测系统则依赖于Logistic回归(也就是最大熵)分类器。他们使用了一组形态,句法和语义特征。为了最大程度地减少错误推断,共同引用解析会应用一组旨在优化精度的启发式规则。他们考虑了检测到的实体提及的类型,并利用了语料库文本的教学性质,在该语料库中,大部分细菌命名是相当明确的(尽管诸如“细菌”之类的自然指称表达很常见) 。当将我们的CRF系统预测的黄金实体提及作为输入实体提及时,所得系统在官方测试集上实现了0.495 F度量,而当将我们的CRF系统预测的当作为输入实体提及时获得了0.351 F测量,两者均高于最佳BioNLP- ST 2013参与者系统。结论我们表明,与不使用共同参考信息的基准系统相比,共同参考分辨率得到了显着提高:端到端系统的测试语料库上约有3.5个F-测量点(开发语料库上有5.5个点),以及当使用金牌提及时,在开发和测试语料库上都有7个F度量点。尽管它的性能优于BioNLP-ST 2013 Bacteria Biotope数据集上发布的最佳系统,但我们认为它主要提供了更强的基准,可以从中开始更多的工作。我们还强调了设计全面的金标准共同参考注释的重要性和难度,我们解释说这是在该任务上进一步取得进展的关键点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号