首页> 外文会议>Workshop on Asian translation >Patent NMT integrated with Large Vocabulary Phrase Translation by SMT at WAT 2017
【24h】

Patent NMT integrated with Large Vocabulary Phrase Translation by SMT at WAT 2017

机译:专利NMT与大型词汇短语转换为2017年SMT

获取原文

摘要

Neural machine translation (NMT) cannot handle a larger vocabulary because the training complexity and decoding complexity proportionally increase with the number of target words. This problem becomes even more serious when translating patent documents, which contain many technical terms that are observed infrequently. Long et al. (2017) proposed to select phrases that contain out-of-vocabulary words using the statistical approach of branching entropy. The selected phrases are then replaced with tokens during training and post-translated by the phrase translation table of SMT. In this paper, we apply the method proposed by Long et al. (2017) to the WAT 2017 Japanese-Chinese and Japanese-English patent datasets. Evaluation on Japanese-to-Chinese, Chinese-to-Japanese, Japanese-to-English and English-to-Japanese patent sentence translation proved the effectiveness of phrases selected with branching entropy, where the NMT model of Long et al. (2017) achieves a substantial improvement over a baseline NMT model without the technique proposed by Long et al. (2017).
机译:神经机翻译(NMT)无法处理更大的词汇,因为训练复杂性和解码复杂性与目标单词的数量成比例地增加。在翻译专利文献时,这个问题变得更加严重,其中包含很少观察到的许多技术术语。朗等人。 (2017)建议选择使用分支熵的统计方法含有词汇外单词的短语。然后,所选短语在培训期间用令牌替换为令牌,并由SMT的“句子翻译表”句子后翻译。在本文中,我们应用了Long等人提出的方法。 (2017)向2017年日本 - 中文和日语专利数据集。日本对汉语,日语,日语和英语到日语专利句子翻译的评估证明了用分支熵选择的短语的有效性,其中Long等人的NMT模型。 (2017)在没有Long等人提出的技术上,通过基线NMT模型实现了大量改进。 (2017)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号