Enhanced Search for Arabic Language Using Latent Semantic Indexing (LSI)

机译：使用潜在语义索引（LSI）增强了阿拉伯语的搜索

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Vector Space Model (VSM) is a common document representation model that is widely used in data mining and information retrieval (IR) systems. However, this technique poses some challenges such as high dimensional space and semantic loss representation. Therefore, the latent semantic indexing (LSI) is proposed to reduce the feature dimensions and to generate semantic rich features that represent conceptual term-document associations. In particular, LSI has been successfully implemented in search engines and text classification tasks. In this paper, we propose a novel approach to enhance the quality of the retrieved documents in search engines for Arabic language. That is, we propose to use a new extension of the LSI technique instead of just using the standard LSI technique. The LSI method proposed is based on employing the word co-occurrences to form a term-by-document matrix. The proposed method is to be based on the documents evaluating cosine similarity measures for term-by-document matrix. We will empirically evaluate the performance using an Arabic data collection that contains no less than 500 documents with no less than 30,000 unique words. A testing set contains keywords from a specific domain will be used to evaluate the quality of the top 20-30 retrieved documents using different singular values (i.e. different number of dimensions). The results will be judged on the performance of the proposed method as it is compared to the standard LSI.

机译：矢量空间模型（VSM）是广泛用于数据挖掘和信息检索（IR）系统的公共文档表示模型。然而，这种技术造成了一些挑战，例如高维空间和语义损失表示。因此，提出了潜在语义索引（LSI）以减少特征维度并生成代表概念性术语文件关联的语义丰富功能。特别是，LSI已在搜索引擎和文本分类任务中成功实现。在本文中，我们提出了一种新颖的方法来提升阿拉伯语搜索引擎中检索的文件的质量。也就是说，我们建议使用LSI技术的新扩展而不是使用标准LSI技术。提出的LSI方法是基于采用单词共同发生以形成逐个文档矩阵。所提出的方法是基于评估余弦相似度测量的文档，用于逐个文档矩阵。我们将使用不少于500个文档的阿拉伯数据收集来凭借不少于30,000个独特单词的文档来凭经验评估绩效。测试集包含来自特定域的关键字将用于评估使用不同奇异值的顶部20-30检索的文档的质量（即，不同数量的维度）。结果将根据拟议方法进行判断，因为它与标准LSI进行比较。

著录项

来源
《International Conference on Intelligent and Innovative Computing Applications》|2018年|665 p. :|共4页
会议地点
作者
Fawaz S. Al-Anzi; Dia AbuZeina;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP14-532;
关键词
Large scale integration; Standards; Semantics; Matrix decomposition; Search engines; Text mining; Indexing;

机译：大规模集成;标准;语义;矩阵分解;搜索引擎;文本挖掘;索引;

相似文献

外文文献
中文文献
专利

1. Indian Languages IR using Latent Semantic Indexing [J] . A.P.SivaKumar, P.Premchand, A.Govardhan International Journal of Computer Science & Information Technology (IJCSIT) . 2011,第4期

机译：使用潜在语义索引的印度语IR
2. Unsupervised Learning Method for Sorting Positive and Negative Reviews Using LSI (Latent Semantic Indexing) with Automatic Generated Queries [J] . Sheikh Muhammad Saqib, Fazal Masud Kundi, Shakeel Ahmad International journal of computer science and network security . 2018,第1期

机译：使用带自动生成的查询的LSI（潜在语义索引）对正面评论和负面评论进行排序的无监督学习方法
3. Analysis on the use of Latent Semantic Indexing (LSI) for document classification and retrieval system of PNP files [J] . Angelica M. Aquino, Enrico P. Chavez MATEC Web of Conferences . 2018,第3期

机译：分析潜在语义索引（LSI）在PNP文件的文件分类和检索系统中的使用
4. Enhanced Search for Arabic Language Using Latent Semantic Indexing (LSI) [C] . Fawaz S. Al-Anzi, Dia AbuZeina International Conference on Intelligent and Innovative Computing Applications . 2018

机译：使用潜在语义索引（LSI）增强对阿拉伯语的搜索
5. Study of document retrieval using Latent Semantic Indexing (LSI) on a very large data set. [D] . Zaman, A. N. K. 2010

机译：使用潜在语义索引（LSI）对非常大的数据集进行文档检索的研究。
6. Latent Semantic Indexing of medical diagnoses using UMLS semantic structures. [O] . C. G. Chute, Y. Yang, D. A. Evans 1991

机译：使用UMLS语义结构对医学诊断进行潜在语义索引。
7. LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier [O] . Wang Ding, Songnian Yu, Shanqing Yu, 2015

机译：LRLW-LsI：一种改进的潜在语义索引（LsI）文本分类器

Enhanced Search for Arabic Language Using Latent Semantic Indexing (LSI)

摘要

著录项

相似文献

相关主题

期刊订阅