首页> 外国专利> Text categorization based on co-classification learning from multilingual corpora

Text categorization based on co-classification learning from multilingual corpora

机译:基于多语言语料库共分类学习的文本分类

摘要

The present document describes a method and a system for generating classifiers from multilingual corpora including subsets of content-equivalent documents written in different languages. When the documents are translations of each other, their classifications must be substantially the same. Embodiments of the invention utilize this similarity in order to enhance the accuracy of the classification in one language based on the classification results in the other language, and vice versa. A system in accordance with the present embodiments implements a method which comprises generating a first classifier from a first subset of the corpora in a first language; generating a second classifier from a second subset of the corpora in a second language; and re-training each of the classifiers on its respective subset based on the classification results of the other classifier, until a training cost between the classification results produced by subsequent iterations reaches a local minima.
机译:本文档描述了一种用于从包括使用不同语言编写的内容等效文档的子集的多语言语料库中生成分类器的方法和系统。当文档相互翻译时,它们的分类必须基本相同。本发明的实施例利用这种相似性以便基于另一种语言的分类结果来增强一种语言的分类的准确性,反之亦然。根据本实施例的系统实现一种方法,该方法包括:以第一语言从语料库的第一子集生成第一分类器;以第二语言从语料库的第二子集生成第二分类器;然后根据另一个分类器的分类结果在每个分类器上对每个分类器进行重新训练,直到后续迭代产生的分类结果之间的训练成本达到局部最小值。

著录项

  • 公开/公告号US8438009B2

    专利类型

  • 公开/公告日2013-05-07

    原文格式PDF

  • 申请/专利权人 MASSIH AMINI;CYRIL GOUTTE;

    申请/专利号US20100909389

  • 发明设计人 MASSIH AMINI;CYRIL GOUTTE;

    申请日2010-10-21

  • 分类号G06F17/20;G06F17/27;

  • 国家 US

  • 入库时间 2022-08-21 16:42:56

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号