首页> 外文会议>International conference on asian language processing >Building an Indonesian rule-based part-of-speech tagger
【24h】

Building an Indonesian rule-based part-of-speech tagger

机译:构建基于印度尼西亚规则的词性标记器

获取原文

摘要

This paper describes work on a part-of-speech tagger for the Indonesian language by employing a rule-based approach. The system tokenizes documents while also considering multi-word expressions and recognizes named entities. It then applies tags to every token, starting from closed-class words to open-class words and disambiguates the tags based on a set of manually defined rules. The system currently obtains an accuracy of 79% on a manually tagged corpus of roughly 250.000 tokens.
机译:本文通过采用基于规则的方法描述了印度尼西亚语言的词性标记器的工作。系统在对文档进行标记化的同时还考虑了多词表达式并识别命名的实体。然后,它将标签应用于从封闭类单词到开放类单词的每个令牌,并基于一组手动定义的规则对标签进行歧义消除。该系统目前在大约250.000个令牌的手动标记语料库上获得79%的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号