Building an Indonesian named entity recognizer using Wikipedia and DBPedia

机译：使用Wikipedia和DBPedia构建印度尼西亚命名实体识别器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper describes the development of an Indonesian NER system using online data such as Wikipedia 1 and DBPedia 2. The system is based on the Stanford NER system [8] and utilizes training documents constructed automatically from Wikipedia. Each entity, i.e. word or phrase that has a hyperlink, in the Wikipedia documents are tagged according to information that is obtained from DBPedia. In this very first version, we are only interested in three entities, namely: Person, Place, and Organization. The system is evaluated using cross fold validation and also evaluated using a gold standard that was manually annotated. Using cross validation evaluation, our Indonesian NER managed to obtain precision and recall values above 90%, whereas the evaluation using gold standard shows that the Indonesian NER achieves high precision but very low recall.

机译：本文介绍了使用维基百科1和DBPedia 2等在线数据的印度尼西亚人系统的开发。该系统基于斯坦福网系统[8]，并利用自动从维基百科自动构建的培训文档。在Wikipedia文档中具有超链接的每个实体，即具有超链接的单词或短语根据从DBPedia获取的信息标记。在这个第一个版本中，我们只对三个实体感兴趣，即：人，地方和组织。使用交叉折叠验证评估系统，并使用手动注释的金标准进行评估。使用交叉验证评估，我们的印度尼西亚人设法获得高于90％的精度和召回值，而使用黄金标准的评估表明印度尼西亚人达到了高精度但非常低的召回。

著录项

来源
《International conference on asian language processing》|2014年|19-22|共4页
会议地点
作者
Luthfi A.; Distiawan B.; Manurung R.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Web sites; natural language processing; DBPedia; Indonesian NER system; Indonesian named entity recognizer; Stanford NER system; Wikipedia; cross fold validation; organization entity; person entity; place entity; precision value; recall value; Data models; Electronic publishing; Encyclopedias; Internet; Tagging; Training data; dbpedia; name entity recognition; stanford ner; wikipedia;

机译：网站;自然语言处理; DBPedia;印尼NER系统;印尼命名实体识别器;斯坦福NER系统;维基百科;交叉折叠验证;组织实体;人实体;地点实体;精度值;召回值;数据模型;电子出版;百科全书互联网标记培训数据dbpedia名称实体识别斯坦福大学维基百科;

相似文献

外文文献
中文文献
专利

1. Disambiguating the Twitter Stream Entities and Enhancing the Search Operation Using DBpedia Ontology: Named Entity Disambiguation for Twitter Streams [J] . N. Senthil Kumar, Dinakaran Muruganantham International journal of information technology and web engineering . 2016,第2期

机译：使用DBpedia本体消除Twitter流实体的歧义并增强搜索操作：Twitter流的命名实体歧义
2. Automatically building large-scale named entity recognition corpora from Chinese Wikipedia [J] . Jie?Zhou, Bi-cheng?Li, Gang?Chen Frontiers of Information Technology & Electronic Engineering . 2015,第11期

机译：从中文维基百科自动建立大规模的命名实体识别语料库
3. Automatically building large-scale named entity recognition corpora from Chinese Wikipedia [J] . Jie ZHOU, Bi-cheng LI, Gang CHEN 浙江大学学报（英文版）（C辑：计算机与电子） . 2015,第011期

机译：从中文维基百科自动建立大规模的命名实体识别语料库
4. Building an Indonesian named entity recognizer using Wikipedia and DBPedia [C] . Luthfi A., Distiawan B., Manurung R. International conference on asian language processing . 2014

机译：建立一个使用维基百科和DBPedia的印度尼西亚命名实体识别器
5. Semi-supervised Named Entity Recognition: Learning to recognize 100 entity types with little supervision [D] . Nadeau, David. 2007

机译：半监督的命名实体识别：在很少的监督下学习识别100种实体类型
6. CheNER: chemical named entity recognizer [O] . Anabel Usié, Rui Alves, Francesc Solsona, -1

机译：CheNER：化学命名实体识别器
7. Improving Multilingual Named Entity Recognition with Wikipedia Entity Type Mapping [O] . Ni, Jian, Florian, Radu 2017

机译：使用维基百科实体改进多语言命名实体识别类型映射

Building an Indonesian named entity recognizer using Wikipedia and DBPedia

摘要

著录项

相似文献

相关主题

期刊订阅