首页> 外文期刊>ACM transactions on Asian language information processing >Inter-, Intra-, and Extra-Chunk Pre-Ordering for Statistical Japanese-to-English Machine Translation
【24h】

Inter-, Intra-, and Extra-Chunk Pre-Ordering for Statistical Japanese-to-English Machine Translation

机译:日,英之间的统计机器翻译之间,内部和外部大块预购

获取原文
获取原文并翻译 | 示例
       

摘要

A rule-based pre-ordering approach is proposed for statistical Japanese-to-English machine translation using the dependency structure of source-side sentences. A Japanese sentence is pre-ordered to an English-like order at the morpheme level for a statistical machine translation system during the training and decoding phase to resolve the reordering problem. In this article, extra-chunk pre-ordering of morphemes is proposed, which allows Japanese functional morphemes to move across chunk boundaries. This contrasts with the intra-chunk reordering used in previous approaches, which restricts the reordering of morphemes within a chunk. Linguistically oriented discussions show that correct pre-ordering cannot be realized without extra-chunk movement of morphemes. The proposed approach is compared with five rule-based pre-ordering approaches designed for Japanese-to-English translation and with a language independent statistical pre-ordering approach on a standard patent dataset and on a news dataset obtained by crawling Internet news sites. Two state-of-the-art statistical machine translation systems, one phrase-based and the other hierarchical phrase-based, are used in experiments. Experimental results show that the proposed approach outperforms the compared approaches on automatic reordering measures (Kendall's r, Spearman's p, fuzzy reordering score, and test set RIBES) and on the automatic translation precision measure of test set BLEU score.
机译:提出了一种基于规则的预排序方法,该方法使用源侧句子的依存关系结构来统计日语到英语的机器翻译。在训练和解码阶段,对于统计机器翻译系统,日语句子在词素级别被预先排序为英语样的命令,以解决重排序问题。在本文中,提出了语素的大块预排序,它允许日语功能语素跨块边界移动。这与先前方法中使用的块内重新排序形成对照,后者限制了块内词素的重新排序。面向语言的讨论表明,如果没有词素的额外块运动,就无法实现正确的预排序。将该提议的方法与为日语到英语翻译而设计的五种基于规则的预排序方法以及在标准专利数据集和通过爬网Internet新闻站点获得的新闻数据集上与语言无关的统计预排序方法进行了比较。实验中使用了两种最先进的统计机器翻译系统,一种基于短语,另一种基于分层短语。实验结果表明,该方法在自动重排序方法(Kendall's,Spearman's p,模糊重排序得分和测试集RIBES)以及测试集BLEU得分的自动翻译精度度量方面均优于比较方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号