首页> 外文会议>Conference of the European Chapter of the Association for Computational Linguistics >How to Produce Unseen Teddy Bears: Improved Morphological Processing of Compounds in SMT
【24h】

How to Produce Unseen Teddy Bears: Improved Morphological Processing of Compounds in SMT

机译:如何生产看不见的玩具熊:SMT中化合物形态的改进处理

获取原文

摘要

Compounding in morphologically rich languages is a highly productive process which often causes SMT approaches to fail because of unseen words. We present an approach for translation into a compounding language that splits compounds into simple words for training and, due to an underspecified representation, allows for free merging of simple words into compounds after translation. In contrast to previous approaches, we use features projected from the source language to predict compound mergings. We integrate our approach into end-to-end SMT and show that many compounds matching the reference translation are produced which did not appear in the training data. Additional manual evaluations support the usefulness of generalizing compound formation in SMT.
机译:以形态丰富的语言进行复合是一个高产的过程,通常会因为看不见单词而导致SMT方法失败。我们提出了一种翻译为复合语言的方法,该方法可将复合词拆分为简单的词进行训练,并且由于表示不明确,因此可以在翻译后将简单的词自由合并为复合词。与以前的方法相比,我们使用从源语言投影而来的功能来预测复合合并。我们将我们的方法集成到端到端SMT中,并显示出许多与参考译文匹配的化合物,这些化合物并未出现在训练数据中。附加的手动评估支持在SMT中推广化合物形成的有用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号