Automatic Acquisition of Huge Training Data for Bio-Medical Named Entity Recognition

机译：自动获取生物医疗名为实体识别的巨大培训数据

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Named Entity Recognition (NER) is an important first step for BioNLP tasks, e.g., gene normalization and event extraction. Employing supervised machine learning techniques for achieving high performance recent NER systems require a manually annotated corpus in which every mention of the desired semantic types in a text is annotated. However, great amounts of human effort is necessary to build and maintain an annotated corpus. This study explores a method to build a high-performance NER without a manually annotated corpus, but using a comprehensible lexical database that stores numerous expressions of semantic types and with huge amount of unanno-tated texts. We underscore the effectiveness of our approach by comparing the performance of NERs trained on an automatically acquired training data and on a manually annotated corpus.

机译：命名实体识别（ner）是Bionlp任务，例如基因标准化和事件提取的重要第一步。采用监督机器学习技术用于实现高性能最近的NER系统需要手动注释的语料库，其中在文本中提到所需的语义类型是注释的。然而，需要大量的人类努力来构建和维持注释的语料库。本研究探讨了一种在没有手动注释的语料库的情况下构建高性能Ner的方法，但使用一个可理解的词汇数据库，该数据库存储大量语义类型和大量的未经内联划分的文本。我们通过比较自动获取的培训数据和手动注释的语料库上的训练训练的训练的性能来强调我们的方法的有效性。

著录项

来源
《Workshop on biomedical natural language processing》|2011年||共9页
会议地点
作者
YuUsami; Han-Cheol Cho; Naoaki Okazaki; Junichi Tsujii;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. A Semi-automatic and low-cost method to learn patterns for named entity recognition* [J] . M. MARRERO, J. URBANO Natural language engineering . 2018,第pta1期

机译：一种半自动的低成本方法，用于学习命名实体识别的模式*
2. An Automatically Generated Annotated Corpus for Albanian Named Entity Recognition [J] . Klesti Hoxha, Artur Baxhaku Cybernetics and information technologies: CIT . 2017,第1期

机译：用于阿尔巴尼亚命名实体识别的自动生成的带注释语料库
3. Automatic compilation of language resources for named entity recognition in Turkish by utilizing Wikipedia article titles [J] . Dilek Kuecuek Computer standards & interfaces . 2015,第sepa期

机译：利用维基百科文章标题自动编译语言资源，以土耳其语命名实体识别
4. Automatic Acquisition of Huge Training Data for Bio-Medical Named Entity Recognition [C] . YuUsami, Han-Cheol Cho, Naoaki Okazaki, Workshop on biomedical natural language processing 2011. . 2011

机译：自动获取巨大的训练数据以用于生物医学命名实体识别
5. Improving named entity recognition with co-training and unlabeled bilingual data. [D] . Ma, Xiaoyi. 2008

机译：通过共同训练和未标记的双语数据来改善命名实体的识别能力。
6. Increasing metadata coverage of SRA BioSample entries using deep learning–based named entity recognition [O] . Adam Klie, Brian Y Tsui, Shamim Mollah, 2021

机译：使用基于深度学习的命名实体识别增加SRA生物分析条目的元数据覆盖范围
7. Named entity recognition on bio-medical literature documents using hybrid based approach [O] . R. Ramachandran, K. Arutchelvan 2021

机译：使用基于混合方法的生物医学文献文件命名实体识别
8. Naming Forum: Proceedings of the IRDS Workshop on Data Entity Naming Conventions [R] . Newton, J. J. 1990

机译：命名论坛：IRDs数据实体命名约定研讨会的会议记录

Automatic Acquisition of Huge Training Data for Bio-Medical Named Entity Recognition

摘要

著录项

相似文献

相关主题

期刊订阅