【24h】

Factored Neural Machine Translation at LoResMT 2019

机译:LoResMT 2019的分解式神经机器翻译

获取原文

摘要

Low resource languages face a major challenge in developing machine translation systems due to unavailability of accurate and parallel datasets with a large corpus size. In the present work, Factored Neural machine Translation Systems have been developed for the following bidirectional language pairs: English & Bhojpuri, English & Magahi, English & Sindhi along with the uni-directional language pair English - Latvian. Both the lemma and Part of Speech (PoS) tags are included as factors to the surface-level English words. No factoring has been done on the low resource language side. The submitted systems have been developed with the parallel datasets provided and no additional parallel or monolingual data have been included. All the seven systems have been evaluated by the LoResMT 2019 organizers in terms of BLEU score, Precision, Recall and F-meas-ure evaluation metrics. It is observed that better evaluation scores have been obtained in those MT systems in which English is the target language. The reason behind this is that the incorporation of lemma and pos tags factors for English words has improved the vocabulary coverage and has also helped in generalization. It is expected that incorporation of linguistic factors on the low resource language words would have improved the evaluation scores of the MT systems involving those languages on the target side.
机译:低资源语言由于缺乏大型语料库的准确和并行数据集而在开发机器翻译系统方面面临着重大挑战。在当前的工作中,针对以下双向语言对开发了因数神经机器翻译系统:英语和Bhojpuri,英语和Magahi,英语和信德语以及单向语言对英语-拉脱维亚。引言和词性(PoS)标签都作为表面级别英语单词的因素被包括在内。在低资源语言方面尚未进行分解。所提交的系统是使用提供的并行数据集开发的,没有包含其他并行或单语数据。 LoResMT 2019组织者已根据BLEU得分,精度,召回率和F量度评估指标对这七个系统进行了评估。可以看出,在以英语为目标语言的MT系统中,获得了更好的评估分数。其背后的原因是,英语单词的引理和pos标签因素的合并提高了词汇覆盖率,也有助于泛化。可以预期的是,将语言因素纳入低资源语言单词中将改善目标端涉及这些语言的MT系统的评估得分。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号