Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions—A Trial Dataset

Jennifer D’Souza; S?ren Auer

首页> 外文期刊>Journal of Data and Information Science >Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions—A Trial Dataset

【24h】

Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions—A Trial Dataset

机译：句子，短语和三重注释，建立自然语言处理贡献的知识图 - 试用数据集

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Purpose This work aims to normalize the N lp C ontributions scheme (henceforward, N lp C ontribution G raph ) to structure, directly from article sentences, the contributions information in Natural Language Processing (NLP) scholarly articles via a two-stage annotation methodology: 1) pilot stage—to define the scheme (described in prior work); and 2) adjudication stage—to normalize the graphing model (the focus of this paper). Design/methodology/approach We re-annotate, a second time, the contributions-pertinent information across 50 prior-annotated NLP scholarly articles in terms of a data pipeline comprising: contribution-centered sentences, phrases, and triple statements. To this end, specifically, care was taken in the adjudication annotation stage to reduce annotation noise while formulating the guidelines for our proposed novel NLP contributions structuring and graphing scheme. Findings The application of N lp C ontribution G raph on the 50 articles resulted finally in a dataset of 900 contribution-focused sentences, 4,702 contribution-information-centered phrases, and 2,980 surface-structured triples. The intra-annotation agreement between the first and second stages, in terms of F1-score, was 67.92% for sentences, 41.82% for phrases, and 22.31% for triple statements indicating that with increased granularity of the information, the annotation decision variance is greater. Research limitations N lp C ontribution G raph has limited scope for structuring scholarly contributions compared with STEM (Science, Technology, Engineering, and Medicine) scholarly knowledge at large. Further, the annotation scheme in this work is designed by only an intra-annotator consensus—a single annotator first annotated the data to propose the initial scheme, following which, the same annotator reannotated the data to normalize the annotations in an adjudication stage. However, the expected goal of this work is to achieve a standardized retrospective model of capturing NLP contributions from scholarly articles. This would entail a larger initiative of enlisting multiple annotators to accommodate different worldviews into a “single” set of structures and relationships as the final scheme. Given that the initial scheme is first proposed and the complexity of the annotation task in the realistic timeframe, our intra-annotation procedure is well-suited. Nevertheless, the model proposed in this work is presently limited since it does not incorporate multiple annotator worldviews. This is planned as future work to produce a robust model. Practical implications We demonstrate N lp C ontribution G raph data integrated into the Open Research Knowledge Graph (ORKG), a next-generation KG-based digital library with intelligent computations enabled over structured scholarly knowledge, as a viable aid to assist researchers in their day-to-day tasks. Originality/value N lp C ontribution G raph is a novel scheme to annotate research contributions from NLP articles and integrate them in a knowledge graph, which to the best of our knowledge does not exist in the community. Furthermore, our quantitative evaluations over the two-stage annotation tasks offer insights into task difficulty.

机译：目的，这项工作旨在通过两级注释方法将N LP C联名计划（由此轮落，N LP C进入G Rave G Raph）标准化为结构，自然语言处理（NLP）学术文章的贡献信息： 1）试验阶段 - 定义该方案（在现有工作中描述）; 2）裁决阶段 - 以标准化图形模型（本文的重点）。我们在数据流水线方面，我们将跨越50个先前注释的NLP学术文章中的贡献的设计/方法/方法在包括：营销中心的句子，短语和三重语句的数据流程中，将贡献。为此，特别是，在审判注释阶段采取护理，以减少注释噪声，同时制定了我们提出的新型NLP贡献结构和绘制方案的指导。发现N LP C联系G Raph在50个物品上的应用最终在900个贡献的句子，4,702个贡献 - 信息中心短语和2,980个表面结构三元组的数据集中。在F1分数方面，第一个和第二个阶段之间的注释协议为67.92％，短语为41.82％，三重陈述的22.31％表明随着信息粒度增加，注释决策方差是更大。研究限制N LP C联名G ra Faph与茎（科学，技术，工程和医学）学术知识相比，建立学术贡献的结构性有限。此外，本作工作中的注释方案仅由一个注释器共识 - 单个注释首先注释数据以提出初始方案，以后，相同的注释器成果将数据标准化在裁决阶段中的注释。然而，这项工作的预期目标是实现从学术文章中捕获NLP贡献的标准化回顾模型。这将需要更大的举办多个注释器，以适应不同的世界观，进入“单一”的结构和作为最终计划的关系。鉴于首先提出初始方案以及在现实时间范围内的注释任务的复杂性，我们的注释程序非常适合。然而，本工作中提出的模型目前是有限的，因为它不包含多个注释器世界观。计划作为未来的工作，以生产强大的模型。实际意义我们演示了N LP C联名G Raph数据集成到开放研究知识图（ORKG）中，一个基于KG的基于KG的数字图书馆，具有智能计算，使其在结构化学术知识中成为一个可行的援助，以协助他们一天的研究人员 - 日期任务。原创性/值N LP C联名G Raph是一种新颖的计划，用于向NLP文章注释研究贡献，并将其整合在知识图中，这是我们在社区中最好的知识不存在。此外，我们对两级注释任务的定量评估提供了对任务难度的见解。

著录项

来源
《Journal of Data and Information Science》 |2021年第3期|共29页
作者
Jennifer D’Souza; S?ren Auer;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类图书馆学、图书馆事业;
关键词
Scholarly knowledge graphsOpen science graphsKnowledge representationNatural language processingSemantic publishing;

机译：学术知识Graphsopen Science Graphsknowledge代表性语言处理蛋白大西洋出版;

相似文献

外文文献
中文文献
专利

1. Sentence,Phrase,and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions—A Trial Dataset [J] . Jennifer D’Souza, Soren Auer 数据与情报科学学报：英文版 . 2021,第003期

机译：句子，短语和三重注释，建立自然语言处理贡献的知识图 - 试用数据集
2. Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements. [J] . Todd Lingren, Louise Deleger, Katalin Molnar, Journal of the American Medical Informatics Association : . 2014,第3期

机译：评估预批注对批注速度和潜在偏见的影响：在临床试验公告中为临床命名实体识别开发自然语言处理黄金标准。
3. The sentence-composition effect: Processing of complex sentences depends on the configuration of common noun phrases versus unusual noun phrases [J] . Johnson M.L., Lowder M.W., Gordon P.C. Journal of Experimental Psychology. General . 2011,第4期

机译：句子组成效应：复杂句子的处理取决于普通名词短语与异常名词短语的配置
4. Learning beyond datasets: Knowledge Graph Augmented Neural Networks for Natural language Processing [C] . Annervaz K M, Somnath Basu Roy Chowdhury, Ambedkar Dukkipati Annual conference of the North American Chapter of the Association for Computational Linguistics: human language technologies . 2018

机译：超越数据集学习：用于自然语言处理的知识图增强神经网络
5. Automatic sentence structure annotation for spoken language processing [D] . Hillard, Dustin Lundring 2008

机译：自动句子结构注释，用于口语处理
6. PNAS Plus: Neurophysiological dynamics of phrase-structure building during sentence processing [O] . Matthew J. Nelson, Imen El Karoui, Kristof Giber, 2017

机译：PNAS Plus：句子处理过程中短语结构构建的神经生理动力学
7. Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions—A Trial Dataset [O] . Jennifer D’Souza, Sören Auer 2021

机译：句子，短语和三重注释构建自然语言处理贡献的知识图 - 试用数据集

Sentence, Phrase, and Triple Annotations to Build a Knowledge Graph of Natural Language Processing Contributions—A Trial Dataset

摘要

著录项

相似文献

相关主题

期刊订阅