首页> 外文会议>International Conference on Translating and the Computer; 20071129-30; London(GB) >Building a bilingual dictionary from movie subtitles based on inter-lingual triggers
【24h】

Building a bilingual dictionary from movie subtitles based on inter-lingual triggers

机译:根据语言间的触发条件,根据电影字幕构建双语词典

获取原文
获取原文并翻译 | 示例

摘要

This paper focuses on two aspects of Machine Translation: parallel corpora and translation model. First, we present a method to automatically build parallel corpora from subtitle files. We use subtitle files gathered from the Internet. This leads to useful data for Subtitling Machine Translation. Our method is based on Dynamic Time Warping. We evaluated this alignment method by comparing it with a sample aligned by hand and we obtained a precision of alignment equal to 0.92. Second, we use the notion of inter-lingual triggers in order to build from the subtitle parallel corpora multilingual dictionaries and translation tables for machine translation. Inter-lingual triggers allow to detect couple of source and target words from parallel corpora. The Mutual Information measure used to determine inter-lingual triggers allows to hypothesize that a word in the source language is a translation of another word in the target language. We evaluate the obtained dictionary by comparing it to two existing dictionaries. Then, we integrated the obtained translation tables into an entire translation decoding process supplied by Pharaoh (Koehn, 2004). We compared the translation performance using our translation tables with the performance obtained by the Giza++ tool (Al-Onaizan et al., 1999). The results showed that the system tuned for our tables improves the Bleu (Papineni and al., 2001) value by 2.2% compared to the ones obtained by Giza++.
机译:本文着重于机器翻译的两个方面:并行语料库和翻译模型。首先,我们介绍一种从字幕文件自动构建并行语料库的方法。我们使用从Internet收集的字幕文件。这为字幕机器翻译提供了有用的数据。我们的方法基于动态时间规整。我们通过与手工对准的样品进行比较来评估这种对准方法,得出的对准精度等于0.92。其次,我们使用语言间触发的概念,以便从字幕并行语料库多语言字典和翻译表构建机器翻译。跨语言触发条件允许从并行语料库中检测出一对源词和目标词。用于确定语言间触发因素的互信息量度可以假设源语言中的一个单词是目标语言中另一个单词的翻译。我们通过将其与两个现有字典进行比较来评估获得的字典。然后,我们将获得的翻译表集成到由Pharaoh提供的整个翻译解码过程中(Koehn,2004年)。我们将翻译表中的翻译性能与Giza ++工具获得的翻译性能进行了比较(Al-Onaizan等,1999)。结果表明,与Giza ++相比,为我们的表调整的系统将Bleu(Papineni等人,2001)的值提高了2.2%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号