首页> 外文会议>Conference on Empirical Methods in Natural Language Processing >Entity Enhanced BERT Pre-training for Chinese NER
【24h】

Entity Enhanced BERT Pre-training for Chinese NER

机译:实体增强了中国人的BERT预培训

获取原文

摘要

Character-level BERT pre-trained in Chinese suffers a limitation of lacking lexicon information, which shows effectiveness for Chinese NER. To integrate the lexicon into pre-trained LMs for Chinese NER, we investigate a semi-supervised entity enhanced BERT pre-training method. In particular, we first extract an entity lexicon from the relevant raw text using a new-word discovery method. We then integrate the entity information into BERT using Char-Entity-Transformer, which augments the self-attention using a combination of character and entity representations. In addition, an entity classification task helps inject the entity information into model parameters in pre-training. The pre-trained models arc used for NER fine-tuning. Experiments on a news dataset and two datasets annotated by ourselves for NER in long-text show that our method is highly effective and achieves the best results.
机译:中国人的角色级BERT接受过缺乏Lexicon信息的限制,这表明了中国人的有效性。要将Lexicon集成到中国人的预先训练的LMS中,我们调查了一个半监督实体增强型BERT预训练方法。特别是,我们首先使用新字发现方法从相关原始文本中提取实体词汇。然后,我们使用Char-Entity-Cramvery仪将实体信息集成到BERT中,这会使用字符和实体表示的组合增强自我关注。另外,实体分类任务有助于将实体信息注入预训练中的模型参数。预训练的模型用于ner微调。关于新闻数据集的实验和我们在长文本中向内注释的两个数据集显示我们的方法非常有效,实现了最佳结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号