首页> 外文会议>Workshop on biomedical natural language processing >Automatic Acquisition of Huge Training Data for Bio-Medical Named Entity Recognition
【24h】

Automatic Acquisition of Huge Training Data for Bio-Medical Named Entity Recognition

机译:自动获取生物医疗名为实体识别的巨大培训数据

获取原文

摘要

Named Entity Recognition (NER) is an important first step for BioNLP tasks, e.g., gene normalization and event extraction. Employing supervised machine learning techniques for achieving high performance recent NER systems require a manually annotated corpus in which every mention of the desired semantic types in a text is annotated. However, great amounts of human effort is necessary to build and maintain an annotated corpus. This study explores a method to build a high-performance NER without a manually annotated corpus, but using a comprehensible lexical database that stores numerous expressions of semantic types and with huge amount of unanno-tated texts. We underscore the effectiveness of our approach by comparing the performance of NERs trained on an automatically acquired training data and on a manually annotated corpus.
机译:命名实体识别(ner)是Bionlp任务,例如基因标准化和事件提取的重要第一步。采用监督机器学习技术用于实现高性能最近的NER系统需要手动注释的语料库,其中在文本中提到所需的语义类型是注释的。然而,需要大量的人类努力来构建和维持注释的语料库。本研究探讨了一种在没有手动注释的语料库的情况下构建高性能Ner的方法,但使用一个可理解的词汇数据库,该数据库存储大量语义类型和大量的未经内联划分的文本。我们通过比较自动获取的培训数据和手动注释的语料库上的训练训练的训练的性能来强调我们的方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号