首页> 外文会议>International conference on asian language processing >Building an Indonesian named entity recognizer using Wikipedia and DBPedia
【24h】

Building an Indonesian named entity recognizer using Wikipedia and DBPedia

机译:使用Wikipedia和DBPedia构建印度尼西亚命名实体识别器

获取原文

摘要

This paper describes the development of an Indonesian NER system using online data such as Wikipedia 1 and DBPedia 2. The system is based on the Stanford NER system [8] and utilizes training documents constructed automatically from Wikipedia. Each entity, i.e. word or phrase that has a hyperlink, in the Wikipedia documents are tagged according to information that is obtained from DBPedia. In this very first version, we are only interested in three entities, namely: Person, Place, and Organization. The system is evaluated using cross fold validation and also evaluated using a gold standard that was manually annotated. Using cross validation evaluation, our Indonesian NER managed to obtain precision and recall values above 90%, whereas the evaluation using gold standard shows that the Indonesian NER achieves high precision but very low recall.
机译:本文介绍了使用维基百科1和DBPedia 2等在线数据的印度尼西亚人系统的开发。该系统基于斯坦福网系统[8],并利用自动从维基百科自动构建的培训文档。在Wikipedia文档中具有超链接的每个实体,即具有超链接的单词或短语根据从DBPedia获取的信息标记。在这个第一个版本中,我们只对三个实体感兴趣,即:人,地方和组织。使用交叉折叠验证评估系统,并使用手动注释的金标准进行评估。使用交叉验证评估,我们的印度尼西亚人设法获得高于90%的精度和召回值,而使用黄金标准的评估表明印度尼西亚人达到了高精度但非常低的召回。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号