首页> 外国专利> SYSTEM, METHOD, AND APPARATUS FOR PAIRING A SHORT DOCUMENT TO ANOTHER SHORT DOCUMENT FROM A PLURALITY OF SHORT DOCUMENTS

SYSTEM, METHOD, AND APPARATUS FOR PAIRING A SHORT DOCUMENT TO ANOTHER SHORT DOCUMENT FROM A PLURALITY OF SHORT DOCUMENTS

机译:用于从多个简短文档中将一个简短文档与另一个简短文档进行配对的系统,方法和装置

摘要

A computer-implemented method for pairing a new document to a document from a plurality of documents. Embodiments include, for each of the new document and the plurality of documents, generating a vector of terms of interest uniquely associated with a document of the new document and the plurality of documents. For each term of interest, an associated element value of the vector is assigned as zero if the term of interest does not occur in the document and one otherwise. The method also includes, for each document from the plurality of documents, determining a similarity between the vectors. The method also includes selecting a document from the plurality of documents as related to the new document if the similarity between the vector for the new document and the vector for the document from the plurality of documents is greater than or equal to a threshold value.
机译:一种用于将新文档与多个文档中的一个文档配对的计算机实现的方法。实施例包括针对新文档和多个文档中的每个文档,生成与新文档和多个文档的文档唯一关联的感兴趣项的向量。对于每个感兴趣的术语,如果感兴趣的术语不在文档中出现,则将向量的关联元素值分配为零,否则分配为一个。该方法还包括针对多个文档中的每个文档,确定向量之间的相似度。该方法还包括如果用于新文档的矢量和来自多个文档的文档的矢量之间的相似度大于或等于阈值,则从与新文档有关的多个文档中选择文档。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号