首页> 外国专利> A method and apparatus for generating a thesaurus of word vectors from a corpus of documents

A method and apparatus for generating a thesaurus of word vectors from a corpus of documents

机译:从文档语料库生成词向量词库的方法和装置

摘要

A method and apparatus accesses relevant documents based on a query (230). A thesaurus of word vectors (242) is formed for the words in the corpus of documents (240). The word vectors represent global lexical co-occurrence patterns and relationships between word neighbors. Document vectors (246), which are formed from the combination of word vectors, are in the same multi-dimensional space as the word vectors. A singular value decomposition is used to reduce the dimensionality of the document vectors. A query vector (232) is formed from the combination of word vectors associated with the words in the query. The query vector and document vectors are compared to determine the relevant documents. The query vector can be divided into several factor clusters to form factor vectors. The factor vectors are then compared to the document vectors to determine the ranking (252) of the documents within the factor cluster. IMAGE
机译:一种方法和设备基于查询来访问相关文档(230)。为文档语料库(240)中的单词形成单词向量词库(242)。词向量表示全局词法共现模式以及词邻居之间的关系。由单词向量的组合形成的文档向量(246)与单词向量在相同的多维空间中。奇异值分解用于减少文档向量的维数。查询向量(232)由与查询中的单词相关联的单词向量的组合形成。比较查询向量和文档向量以确定相关文档。查询向量可以分为几个因子簇以形成因子向量。然后将因子向量与文档向量进行比较,以确定因子簇内文档的排名(252)。 <图像>

著录项

  • 公开/公告号EP0687987B1

    专利类型

  • 公开/公告日2003-06-04

    原文格式PDF

  • 申请/专利权人 XEROX CORPORATION;

    申请/专利号EP19950304116

  • 发明设计人 SCHUETZE HINRICH;

    申请日1995-06-14

  • 分类号G06F17/30;G06F17/27;

  • 国家 EP

  • 入库时间 2022-08-21 23:54:25

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号