首页> 外文期刊>Journal of Data and Information Science >Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions—A Trial Dataset
【24h】

Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions—A Trial Dataset

机译:句子,短语和三重注释,建立自然语言处理贡献的知识图 - 试用数据集

获取原文
           

摘要

Purpose This work aims to normalize the N lp C ontributions scheme (henceforward, N lp C ontribution G raph ) to structure, directly from article sentences, the contributions information in Natural Language Processing (NLP) scholarly articles via a two-stage annotation methodology: 1) pilot stage—to define the scheme (described in prior work); and 2) adjudication stage—to normalize the graphing model (the focus of this paper). Design/methodology/approach We re-annotate, a second time, the contributions-pertinent information across 50 prior-annotated NLP scholarly articles in terms of a data pipeline comprising: contribution-centered sentences, phrases, and triple statements. To this end, specifically, care was taken in the adjudication annotation stage to reduce annotation noise while formulating the guidelines for our proposed novel NLP contributions structuring and graphing scheme. Findings The application of N lp C ontribution G raph on the 50 articles resulted finally in a dataset of 900 contribution-focused sentences, 4,702 contribution-information-centered phrases, and 2,980 surface-structured triples. The intra-annotation agreement between the first and second stages, in terms of F1-score, was 67.92% for sentences, 41.82% for phrases, and 22.31% for triple statements indicating that with increased granularity of the information, the annotation decision variance is greater. Research limitations N lp C ontribution G raph has limited scope for structuring scholarly contributions compared with STEM (Science, Technology, Engineering, and Medicine) scholarly knowledge at large. Further, the annotation scheme in this work is designed by only an intra-annotator consensus—a single annotator first annotated the data to propose the initial scheme, following which, the same annotator reannotated the data to normalize the annotations in an adjudication stage. However, the expected goal of this work is to achieve a standardized retrospective model of capturing NLP contributions from scholarly articles. This would entail a larger initiative of enlisting multiple annotators to accommodate different worldviews into a “single” set of structures and relationships as the final scheme. Given that the initial scheme is first proposed and the complexity of the annotation task in the realistic timeframe, our intra-annotation procedure is well-suited. Nevertheless, the model proposed in this work is presently limited since it does not incorporate multiple annotator worldviews. This is planned as future work to produce a robust model. Practical implications We demonstrate N lp C ontribution G raph data integrated into the Open Research Knowledge Graph (ORKG), a next-generation KG-based digital library with intelligent computations enabled over structured scholarly knowledge, as a viable aid to assist researchers in their day-to-day tasks. Originality/value N lp C ontribution G raph is a novel scheme to annotate research contributions from NLP articles and integrate them in a knowledge graph, which to the best of our knowledge does not exist in the community. Furthermore, our quantitative evaluations over the two-stage annotation tasks offer insights into task difficulty.
机译:目的,这项工作旨在通过两级注释方法将N LP C联名计划(由此轮落,N LP C进入G Rave G Raph)标准化为结构,自然语言处理(NLP)学术文章的贡献信息: 1)试验阶段 - 定义该方案(在现有工作中描述); 2)裁决阶段 - 以标准化图形模型(本文的重点)。我们在数据流水线方面,我们将跨越50个先前注释的NLP学术文章中的贡献的设计/方法/方法在包括:营销中心的句子,短语和三重语句的数据流程中,将贡献。为此,特别是,在审判注释阶段采取护理,以减少注释噪声,同时制定了我们提出的新型NLP贡献结构和绘制方案的指导。发现N LP C联系G Raph在50个物品上的应用最终在900个贡献的句子,4,702个贡献 - 信息中心短语和2,980个表面结构三元组的数据集中。在F1分数方面,第一个和第二个阶段之间的注释协议为67.92%,短语为41.82%,三重陈述的22.31%表明随着信息粒度增加,注释决策方差是更大。研究限制N LP C联名G ra Faph与茎(科学,技术,工程和医学)学术知识相比,建立学术贡献的结构性有限。此外,本作工作中的注释方案仅由一个注释器共识 - 单个注释首先注释数据以提出初始方案,以后,相同的注释器成果将数据标准化在裁决阶段中的注释。然而,这项工作的预期目标是实现从学术文章中捕获NLP贡献的标准化回顾模型。这将需要更大的举办多个注释器,以适应不同的世界观,进入“单一”的结构和作为最终计划的关系。鉴于首先提出初始方案以及在现实时间范围内的注释任务的复杂性,我们的注释程序非常适合。然而,本工作中提出的模型目前是有限的,因为它不包含多个注释器世界观。计划作为未来的工作,以生产强大的模型。实际意义我们演示了N LP C联名G Raph数据集成到开放研究知识图(ORKG)中,一个基于KG的基于KG的数字图书馆,具有智能计算,使其在结构化学术知识中成为一个可行的援助,以协助他们一天的研究人员 - 日期任务。原创性/值N LP C联名G Raph是一种新颖的计划,用于向NLP文章注释研究贡献,并将其整合在知识图中,这是我们在社区中最好的知识不存在。此外,我们对两级注释任务的定量评估提供了对任务难度的见解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号