【24h】

Building a Test Collection for Sorani Kurdish

机译:为Sorani Kurdish建立测试集

获取原文
获取原文并翻译 | 示例

摘要

Despite having a large number of speakers, Sorani — one of the two principle branches of the Kurdish language — is among the less-resourced languages. This paper reports on the outcomes of a project aimed at providing the essential resources for processing Sorani texts. The primary output of this project is Pewan, the first standard Test Collection to evaluate Sorani Information Retrieval systems. The other language resources that we have constructed in this project are: (i) a light-stemmer, (ii) a list of affixes, and (iii) a list of stopwords. We also used these newly-built resources to study the effectiveness of basic IR strategies on Sorani documents. Our experimental results show that normalization and, to a lesser extent, stemming can greatly improve the performance of Sorani IR systems.
机译:尽管拥有大量讲者,但库拉语的两个主要分支之一Sorani却是资源较少的语言之一。本文报告了旨在为处理Sorani文本提供必要资源的项目结果。该项目的主要输出是Pewan,它是评估Sorani信息检索系统的第一个标准测试集。我们在该项目中构建的其他语言资源包括:(i)轻干,(ii)词缀列表和(iii)停用词列表。我们还使用这些新建资源来研究Sorani文档上基本IR策略的有效性。我们的实验结果表明,归一化以及在较小程度上限制词干可以大大改善Sorani红外系统的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号