首页> 外国专利> A method and apparatus for retrieving relevant documents from a corpus of documents

A method and apparatus for retrieving relevant documents from a corpus of documents

机译:从文件语料库中检索相关文件的方法和设备

摘要

A method and apparatus accesses relevant documents based on a query (230). A thesaurus of word vectors (242) is formed for the words in the corpus of documents (240). The word vectors represent global lexical co-occurrence patterns and relationships between word neighbors. Document vectors (246), which are formed from the combination of word vectors, are in the same multi-dimensional space as the word vectors. A singular value decomposition is used to reduce the dimensionality of the document vectors. A query vector (232) is formed from the combination of word vectors associated with the words in the query. The query vector and document vectors are compared to determine the relevant documents. The query vector can be divided into several factor clusters to form factor vectors. The factor vectors are then compared to the document vectors to determine the ranking (252) of the documents within the factor cluster.
机译:一种方法和设备基于查询来访问相关文档(230)。为文档语料库(240)中的单词形成单词向量词库(242)。单词向量表示整体词汇共现模式以及单词邻居之间的关系。由单词向量的组合形成的文档向量(246)与单词向量在相同的多维空间中。奇异值分解用于减少文档向量的维数。查询向量(232)由与查询中的单词相关联的单词向量的组合形成。比较查询向量和文档向量以确定相关文档。查询向量可以分为几个因子簇以形成因子向量。然后将因子向量与文档向量进行比较,以确定因子簇内文档的排名(252)。

著录项

  • 公开/公告号EP0687987A1

    专利类型

  • 公开/公告日1995-12-20

    原文格式PDF

  • 申请/专利权人 XEROX CORPORATION;

    申请/专利号EP19950304116

  • 发明设计人 SCHUETZE HINRICH;

    申请日1995-06-14

  • 分类号G06F17/30;G06F17/27;

  • 国家 EP

  • 入库时间 2022-08-22 03:48:15

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号