首页> 外文学位 >Design and implementation of automatic word and phrase indexing for information retrieval with Arabic documents.
【24h】

Design and implementation of automatic word and phrase indexing for information retrieval with Arabic documents.

机译:自动单词和短语索引的设计和实现,用于使用阿拉伯文档进行信息检索。

获取原文
获取原文并翻译 | 示例

摘要

Investigation of methods of automatic information retrieval for Arabic is essential to the growth of learning in the Arab world. It is the simplest and most cost-effective way to make the resources of large reference libraries available to the increasing numbers of students and researchers in the Arab word.; We have put together a corpus of 242 abstracts of Arabic documents using the proceedings of the Saudi Arabian National Conferences as a source. All these abstracts involve computer science and information systems. We also designed and built an automatic retrieval system from scratch to handle Arabic data. The system is designed to support the following goals. First, to test an automatic word indexing system based on the three indexing methods, full words, stems, and roots. Second, to test an automatic phrase indexing process using the three indexing method, full words, stems, and roots. The system was implemented in the C language using the GCC compiler and runs on IBM/PCS and compatible microcomputers.; We have implemented both automatic and manual indexing techniques for this corpus with and without phrases. A long series of experiments using measures of recall and precision has demonstrated that automatic indexing is at least as effective as manual indexing and more effective in some cases. Since automatic indexing is both cheaper and faster, our results suggest that we can achieve a wider coverage of the literature with less money and produce as good results as with manual indexing.; We have also compared the results using words, stems, and roots as index terms and confirmed the results obtained by Al-Kharashi and Abu-Salem with smaller corpora that root indexing is more effective than word indexing.; Our results with phrase indexing are puzzling and suggest a need for further research: use of phrases improves the results with automatic indexing but not with manual indexing.
机译:对阿拉伯语自动信息检索方法的研究对于阿拉伯世界学习的增长至关重要。这是使越来越多的阿拉伯语学生和研究人员可以使用大型参考图书馆资源的最简单,最经济的方法。我们使用沙特阿拉伯全国会议的论文集作为来源,整理了242个阿拉伯文摘要的语料库。所有这些摘要都涉及计算机科学和信息系统。我们还设计并构建了一个自动检索系统,以处理阿拉伯数据。该系统旨在支持以下目标。首先,测试基于完整词,词干和词根的三种索引方法的自动词索引系统。其次,使用三种索引方法(完整单词,词干和词根)测试自动短语索引过程。该系统使用GCC编译器以C语言实现,并在IBM / PCS和兼容的微型计算机上运行。我们已经为带有和不带有短语的语料库实现了自动和手动索引技术。一系列使用召回率和精确度的实验表明,自动索引至少与手动索引一样有效,在某些情况下更有效。由于自动索引既便宜又快捷,因此我们的结果表明,我们可以用更少的钱来获得更广泛的文献资料,并获得与手动索引一样好的结果。我们还比较了使用词,词根和词根作为索引词的结果,并确认了Al-Kharashi和Abu-Salem用较小的语料库获得的结果,词根索引比词索引更有效。我们的词组索引结果令人困惑,建议需要进一步研究:词组的使用可改善自动索引的结果,而不能改善手动索引的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号