...
首页> 外文期刊>Computer speech and language >Mining methodologies from NLP publications: A case study in automatic terminology recognition
【24h】

Mining methodologies from NLP publications: A case study in automatic terminology recognition

机译:NLP出版物中的挖掘方法:自动术语识别的案例研究

获取原文
获取原文并翻译 | 示例
           

摘要

The task of reviewing scientific publications and keeping up with the literature in a particular domain is extremely time-consuming. Extraction and exploration of methodological information, in particular, requires systematic understanding of the literature, but in many cases is performed within a limited context of publications that can be manually reviewed by an individual or group. Automated methodology identification could provide an opportunity for systematic retrieval of relevant documents and for exploring developments within a given discipline. In this paper we present a system for the identification of methodology mentions in scientific publications in the area of natural language processing, and in particular in automatic terminology recognition. The system comprises two major layers: the first layer is an automatic identification of methodological sentences; the second layer highlights methodological phrases (segments). Each mention is categorised in four semantic categories: Task, Method, Resource/Feature and Implementation. Extraction and classification of the segments is formalised as a sequence tagging problem and four separate phrase-based Conditional Random Fields are used to accomplish the task. The system has been evaluated on a manually annotated corpus comprising 45 full text articles. The results for the segment level annotation show an F-measure of 53% for identification of Task and Method mentions (with 70% precision), whereas the F-measures for Resource/Feature and Implementation identification were 61% (with 67% precision) and 75% (with 86% precision) respectively. At the document-level, an F-measure of 72% (with 81% precision) for Task mentions, 60% (with 81% precision) for Method mentions, 74% (with 78% precision) for the Resource/Feature and 79% (with 81% precision) for the Implementation categories have been achieved. We provide a detailed analysis of errors and explore the impact that the particular groups of features have on the extraction of methodological segments.
机译:审查科学出版物并与特定领域的文献保持一致的任务非常耗时。方法论信息的提取和探索尤其需要系统地理解文献,但是在许多情况下,这些出版物是在出版物的有限范围内进行的,这些出版物可以由个人或团体手动审核。自动识别方法可以为系统检索相关文件和探索给定学科内的发展提供机会。在本文中,我们提供了一种系统,用于识别自然语言处理领域,尤其是自动术语识别领域中科学出版物中提到的方法论。该系统包括两个主要层:第一层是方法语句的自动识别;第二层是方法语句的自动识别。第二层突出了方法论短语(段)。每个提及都分为四个语义类别:任务,方法,资源/功能和实现。段的提取和分类被形式化为序列标记问题,并且使用四个单独的基于短语的条件随机场来完成任务。该系统已在包含45篇全文文章的人工注释语料库上进行了评估。段级别注释的结果显示,用于识别任务和方法提及的F度量为53%(精度为70%),而用于资源/功能和实现识别的F度量为61%(精度为67%)。和75%(精度为86%)。在文档级别,对于任务提及,F度量为72%(精度为81%),对于方法提及为60%(精度为81%),对于资源/功能为74%(精度为78%),而79实现类别的百分比(精度为81%)已达到。我们提供了对错误的详细分析,并探讨了特定特征组对方法学节段提取的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号