首页> 外文会议>International conference on asian language processing >A maximum entropy based reordering model for Mongolian-Chinese SMT with morphological information
【24h】

A maximum entropy based reordering model for Mongolian-Chinese SMT with morphological information

机译:基于形态信息的蒙汉SMT基于最大熵的重排序模型

获取原文

摘要

Different order between Mongolian and Chinese and the scarcity of parallel corpus are the main problems in Mongolian-Chinese statistical machine translation (SMT). We propose a method that adopts morphological information as the features of the maximum entropy based phrase reordering model for Mongolian-Chinese SMT. By taking advantage of the Mongolian morphological information, we add Mongolian stem and affix as phrase boundary information and use a maximum entropy model to predict reordering of neighbor blocks. To some extent, our method can alleviate the influence of reordering caused by the data sparseness. In addition, we further add part-of-speech (POS) as the features in the reordering model. Experiments show that the approach outperforms the maximum entropy model using only boundary words information and provides a maximum improvement of 0.8 BLEU score increment over baseline.
机译:蒙汉统计机器翻译(SMT)的主要问题是蒙汉语的顺序不同以及平行语料库的缺乏。我们提出一种采用形态学信息作为蒙汉SMT基于最大熵的短语重排模型的特征的方法。通过利用蒙古语的形态信息,我们将蒙古语词干和词缀添加为短语边界信息,并使用最大熵模型来预测相邻块的重新排序。在某种程度上,我们的方法可以减轻数据稀疏性对重新排序的影响。此外,我们还添加了词性(POS)作为重排序模型中的功能。实验表明,该方法仅使用边界词信息即可胜过最大熵模型,并且相对于基线最大可提高0.8 BLEU分数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号