首页> 外国专利> Method of Augmenting Korean Classical Literature Corpus for Machine Translation Model

Method of Augmenting Korean Classical Literature Corpus for Machine Translation Model

机译:面向机器翻译模型的韩国古典文学语料库扩充方法

摘要

The present invention relates to a method for augmenting a Chinese text book corpus for a machine translation model, and more particularly, using a parallel corpus constructed for learning at least one of gimbal points, Chinese character noise, translation stage noise, reverse translation, sentence segmentation, and pre-extraction techniques. It relates to a method of augmenting a corpus using a technique. The Chinese text book corpus augmentation method for a machine translation model according to an embodiment of the present invention comprises a parallel corpus built for learning that is a starting word (source) of an input unit, In the augmentation part, any one or more techniques of punctuation marks (punctuation marks), Chinese characters (original characters) noise (A), translation stage noise (B), reverse translation (C), sentence division (D), and dictionary extraction (E) augmented, By outputting the target language (target) to the output unit, Characterized in increasing the amount of corpus.
机译:本发明涉及一种用于为机器翻译模型扩充中文教科书语料库的方法,更具体地说,涉及使用为学习框架点、汉字噪声、翻译阶段噪声、反向翻译、句子分割和预提取技术中的至少一种而构建的平行语料库。它涉及一种使用技术扩充语料库的方法。根据本发明实施例的用于机器翻译模型的中文教科书语料库增强方法包括为学习而构建的并行语料库,其是输入单元的起始词(源),在增强部分中,标点(标点)、汉字(原始字符)噪声(a)的任何一种或多种技术,翻译阶段噪声(B)、反译(C)、句子分割(D)和词典提取(E)通过将目标语言(目标)输出到输出单元来增强,其特征是增加语料库的数量。

著录项

  • 公开/公告号KR102395811B1

    专利类型

  • 公开/公告日2022-05-09

    原文格式PDF

  • 申请/专利权人 주식회사 엘솔루;

    申请/专利号KR20210163048

  • 发明设计人 이영;오영대;김우균;

    申请日2021-11-24

  • 分类号G06N5/02;G06F40/174;G06F40/53;G06F40/58;G06N20;

  • 国家 KR

  • 入库时间 2022-08-25 00:51:50

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号