首页> 外国专利> MULTILINGUAL DOCUMENT-SIMILARITY-DEGREE LEARNING DEVICE, MULTILINGUAL DOCUMENT-SIMILARITY-DEGREE DETERMINATION DEVICE, MULTILINGUAL DOCUMENT-SIMILARITY-DEGREE LEARNING METHOD, MULTILINGUAL DOCUMENT-SIMILARITY-DEGREE DETERMINATION METHOD, AND STORAGE MEDIUM

MULTILINGUAL DOCUMENT-SIMILARITY-DEGREE LEARNING DEVICE, MULTILINGUAL DOCUMENT-SIMILARITY-DEGREE DETERMINATION DEVICE, MULTILINGUAL DOCUMENT-SIMILARITY-DEGREE LEARNING METHOD, MULTILINGUAL DOCUMENT-SIMILARITY-DEGREE DETERMINATION METHOD, AND STORAGE MEDIUM

机译:多语言相似程度的学习设备,多语言相似程度的确定设备,多语言相似程度的学习方法,多语言相似程度的确定方法和存储介质

摘要

This invention provides a technology for searching for similar documents in a multilingual document group at lower cost and with higher precision, even if three or more languages are present. This multilingual document-similarity-degree learning device (1) comprises the following: a multilingual matrix storage unit (11) that holds a matrix for each target language; a word-vector acquisition unit (12) that acquires a word vector corresponding to a document; a meaning-vector creation unit (13) that creates a meaning vector for said document on the basis of the word vector for said document and the matrix corresponding to the language in which said document is written; a similarity-degree calculation unit (14) that calculates similarity degrees on the basis of meaning vectors for documents in a document group; and a multilingual matrix learning unit (15) that implements learning by adjusting values in the matrices corresponding to the respective target languages such that, within a set of documents each written in one of the target languages, the similarity degrees for groups of documents that exhibit source-translation relationships are higher than the similarity degrees for groups of documents that do not exhibit source-translation relationships.
机译:本发明提供了一种即使存在三种或多种语言,也能以较低的成本和较高的精度在多语言文档组中搜索相似文档的技术。该多语言文档相似度学习设备(1)包括:多语言矩阵存储单元(11),其保存每种目标语言的矩阵;以及单词矢量获取单元(12),获取与文档相对应的单词矢量;含义矢量创建单元(13),基于所述文档的单词矢量和与所述文档的书写语言相对应的矩阵,为所述文档创建含义矢量;相似度计算单元(14),基于文档组中的文档的含义向量,计算相似度;以及多语言矩阵学习单元(15),该多语言矩阵学习单元通过调整与各个目标语言相对应的矩阵中的值来实现学习,以使得在各自以目标语言之一书写的一组文档中,对于表现出的文档组的相似度对于不显示源翻译关系的文档组,源翻译关系高于相似度。

著录项

  • 公开/公告号WO2015145981A1

    专利类型

  • 公开/公告日2015-10-01

    原文格式PDF

  • 申请/专利权人 NEC CORPORATION;

    申请/专利号WO2015JP01028

  • 发明设计人 SADAMASA KUNIHIKO;

    申请日2015-02-27

  • 分类号G06F17/30;G06F17/27;

  • 国家 WO

  • 入库时间 2022-08-21 15:03:58

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号