...
首页> 外文期刊>Journal of Information Science >Modern information retrieval in Arabic -catering to standard and colloquial Arabic users
【24h】

Modern information retrieval in Arabic -catering to standard and colloquial Arabic users

机译:阿拉伯语的现代信息检索-面向标准和口语的阿拉伯用户

获取原文
获取原文并翻译 | 示例
           

摘要

The widespread use of colloquial dialects among the younger generation of Arabs is depriving many of them the fruits of information freedom. Although most Arabs have no problem with reading text in formal Arabic, widely known as Modern Standard Arabic (MSA), the younger generation is more adept at colloquial Arabic, mainly owing to the widespread use of social media. The current search engines cater mostly to MSA. This means that materials written in colloquial are off-limits to those who use MSA, and similarly the MSA contents are off-limits for those who communicate in colloquial only. To achieve the full potential of an information-retrieval system, we need a successful scheme that interprets queries whether they are in MSA, colloquial Arabic or a combination of both. In this paper we design an information-retrieval system that addresses our concern against the backdrop of one of the local dialects in Saudi Arabia. Our system is based on modifying an MSA stemming technique and a set of colloquial ↔ MSA conversion rules that are lexicon based. We tested the system using 44 queries on a corpus of over 1400 documents (MSA, colloquial, mix). The average precision was 84.3%, while the average recall was 96.5%. In the second test we compared the precision of the retrieved documents by our system vs Google and Yahoo! search engines. The respective average precisions were 78.2, 51.9 and 56.2%.
机译:口语方言在年轻的阿拉伯人中的广泛使用正在剥夺他们许多人获得信息自由的果实。尽管大多数阿拉伯人在阅读正式的阿拉伯语(众所周知的现代标准阿拉伯语(MSA))方面没有问题,但年轻一代更擅长口语阿拉伯语,这主要是由于社交媒体的广泛使用。当前的搜索引擎主要满足MSA的需求。这意味着以口语形式编写的材料对于使用MSA的人员是禁忌的,同样,对于仅以口语进行交流的人员,MSA内容也是禁忌的。为了充分发挥信息检索系统的潜力,我们需要一种成功的方案,该方案可以解释查询是MSA语言,口语阿拉伯语还是两者的组合。在本文中,我们设计了一个信息检索系统,该系统在沙特阿拉伯当地方言之一的背景下解决了我们的担忧。我们的系统基于修改MSA词干提取技术和一组基于词典的口语↔MSA转换规则。我们对超过1400个文档(MSA,口语,混合)的语料库使用了44个查询,对系统进行了测试。平均精度为84.3%,平均召回率为96.5%。在第二个测试中,我们比较了我们的系统与Google和Yahoo!检索到的文档的精度。搜索引擎。各自的平均精度分别为78.2%,51.9%和56.2%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号