首页>
外国专利>
A method and apparatus for generating a thesaurus of word vectors from a corpus of documents
A method and apparatus for generating a thesaurus of word vectors from a corpus of documents
展开▼
机译:从文档语料库生成词向量词库的方法和装置
展开▼
页面导航
摘要
著录项
相似文献
摘要
A method and apparatus accesses relevant documents based on a query (230). A thesaurus of word vectors (242) is formed for the words in the corpus of documents (240). The word vectors represent global lexical co-occurrence patterns and relationships between word neighbors. Document vectors (246), which are formed from the combination of word vectors, are in the same multi-dimensional space as the word vectors. A singular value decomposition is used to reduce the dimensionality of the document vectors. A query vector (232) is formed from the combination of word vectors associated with the words in the query. The query vector and document vectors are compared to determine the relevant documents. The query vector can be divided into several factor clusters to form factor vectors. The factor vectors are then compared to the document vectors to determine the ranking (252) of the documents within the factor cluster. IMAGE
展开▼