首页> 外文会议>International Conference on Computational Linguistics >Exploring Cross-sentence Contexts for Named Entity Recognition with BERT
【24h】

Exploring Cross-sentence Contexts for Named Entity Recognition with BERT

机译:探索用伯特命名实体识别的跨句子上下文

获取原文

摘要

Named entity recognition (NER) is frequently addressed as a sequence classification task with each input consisting of one sentence of text. It is nevertheless clear that useful information for NER is often found also elsewhere in text. Recent self-attention models like BERT can both capture long-distance relationships in input and represent inputs consisting of several sentences. This creates opportunities for adding cross-sentence information in natural language processing tasks. This paper presents a systematic study exploring the use of cross-sentence information for NER using BERT models in five languages. We find that adding context as additional sentences to BERT input systematically increases NER performance. Multiple sentences in input samples allows us to study the predictions of the sentences in different contexts. We propose a straightforward method, Contextual Majority Voting (CMV), to combine these different predictions and demonstrate this to further increase NER performance. Evaluation on established datasets, including the CoNLL'02 and CoNLL'03 NER benchmarks, demonstrates that our proposed approach can improve on the state-of-the-art NER results on English, Dutch, and Finnish, achieves the best reported BERT-based results on German, and is on par with other BERT-based approaches in Spanish. We release all methods implemented in this work under open licenses.
机译:命名实体识别(ner)经常被寻址为具有由文本的一个句子组成的每个输入的序列分类任务。然而,明确的是,在文本中的其他地方通常会发现ner的有用信息。最近的自我注意模型如BERT可以捕获输入中的长距离关系,并且代表由几个句子组成的输入。这为在自然语言处理任务中添加了跨句子信息来创造机会。本文介绍了使用五种语言中使用BERT模型使用串扰信息的系统研究。我们发现添加上下文作为额外的句子,以系统地增加Ner性能。输入样本中的多个句子允许我们研究不同上下文中句子的预测。我们提出了一种简单的方法,语境多数投票(CMV),以结合这些不同的预测,并证明这是为了进一步提高NER性能。在既定数据集(包括Conll'02和Conll'03 Ner基准)的评估表明,我们所提出的方法可以改善英语,荷兰语,芬兰语的最先进的NER结果,实现了最佳报告的基于伯特的结果结果德语,并与西班牙语的其他基于伯特的方法进行了指标。我们在开放许可证下发布了在此工作中实现的所有方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号