【24h】

Kernel based part of speech tagger for Kannada

机译:基于内核的Kannada语音标记器部分

获取原文

摘要

The proposed paper presents the development of a part-of-speech tagger for Kannada language that can be used for analyzing and annotating Kannada texts. POS tagging is considered as one of the basic tool and component necessary for many Natural Language Processing (NLP) applications like speech recognition, natural language parsing, information retrieval and information extraction of a given language. In order to alleviate problems for Kannada language, we proposed a new machine learning POS tagger approach. Identifying the ambiguities in Kannada lexical items is the challenging objective in the process of developing an efficient and accurate POS Tagger. We have developed our own tagset which consist of 30 tags and built a part-of-speech Tagger for Kannada Language using Support Vector Machine (SVM). A corpus of texts, extracted from Kannada news papers and books, is manually morphologically analyzed and tagged using our developed tagset. The performance of the system is evaluated and we found that the result obtained was more efficient and accurate compared with earlier methods for Kannada POS tagging.
机译:拟议论文介绍了用于Kannada语言的词性标记器的开发,该标记器可用于分析和注释Kannada文本。 POS标记被认为是许多自然语言处理(NLP)应用程序(例如语音识别,自然语言解析,信息检索和给定语言的信息提取)所必需的基本工具和组件之一。为了减轻卡纳达语的问题,我们提出了一种新的机器学习POS标记器方法。在开发高效,准确的POS Tagger的过程中,识别Kannada词汇项目中的歧义是具有挑战性的目标。我们已经开发了自己的标签集,该标签集包含30个标签,并使用支持向量机(SVM)为卡纳达语语言构建了词性Tagger。从卡纳达语报纸和书籍中提取的文本语料库使用我们开发的标记集进行了手工形态分析和标记。对系统的性能进行了评估,我们发现与Kannada POS标记的早期方法相比,所获得的结果更加有效和准确。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号