...
首页> 外文期刊>International Journal of Computational Science and Engineering >Including category information as supplements in latent semantic analysis of Hindi documents
【24h】

Including category information as supplements in latent semantic analysis of Hindi documents

机译:包括类别信息作为印地文文件潜在语义分析的补充

获取原文
获取原文并翻译 | 示例
           

摘要

Latent semantic analysis (LSA) is a mathematical model that is used to capture the semantic structure of documents by using the correlations between the textual elements in them. LSA captures the semantic structure very well being independent of external sources of semantics. However, the model's performance increases when it is supplemented with extra information. The work presented in this paper is to modify the model to analyse word correlations in documents by considering the document category information as supplements in the process. This enhancement is called supplemented latent semantic analysis (SLSA). SLSA's performance is empirically evaluated in a document classification application by comparing the accuracies of classification against plain LSA for various term weighting schemes. An increment of 1.14%, 1.30% and 1.63% is observed in the classification accuracies when SLSA is compared with plain LSA for tf, idf and tfidf respectively in the initial term-by-document matrix.
机译:潜在语义分析(LSA)是一种数学模型,用于通过使用它们中的文本元素之间的相关性来捕获文档的语义结构。 LSA捕获了很好的语义结构,与外部语义源无关。 但是,该模型的性能随着额外信息而增加。 本文提出的工作是通过将文档类别信息视为过程中的补充,修改模型以分析文档中的字相关性。 这种增强称为补充潜在语义分析(SLSA)。 通过比较针对各种术语加权方案的分类的准确性,SLSA的表现在文档分类应用中进行了经验评估。 当SLSA与分别在初始术语矩阵中的初始术语矩阵中分别与TF,IDF和TFIDF的普通LSA进行比较时,在分类精度中观察到1.14%,1.30%和1.63%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号