首页> 外文期刊>Quality Control, Transactions >Machine Learning Based Optimized Pruning Approach for Decoding in Statistical Machine Translation
【24h】

Machine Learning Based Optimized Pruning Approach for Decoding in Statistical Machine Translation

机译:统计机器翻译中基于机器学习的优化修剪方法

获取原文
获取原文并翻译 | 示例
           

摘要

A conventional decoding algorithm is critical to the success of any statistical machine translation system. Providing an enormous amount of space leads to inappropriate slow decoding. There is a trade-off between the translation accuracy and the decoding speed. Pruning algorithms (like histogram pruning, threshold pruning) are trying to optimize this. The pruning algorithm has a pre-defined limit on the supplemental parameters (i.e. stack size, beam threshold) that helps to improve the translation quality and speed up the decoder. However, the same parameter value cannot provide the qualitative translation in optimum time. These stack size and beam threshold values should be changed based on texts' structures. In this paper, we identify the best stack size and beam threshold values runtime based on the text structure and characteristics using a machine learning-based approach. Then, the values of these parameters are applied into the beam search algorithm for decoding. Finally, our experiments on low-resourced Asian languages show significant performance improvements in terms of their translation accuracy and decoding time. The HindEnCorp and ILCI datasets are used as the benchmark datasets with English-Hindi, Hindi-Marathi, Hindi-Konkani, Bengali-Hindi language pair, for our various experiments. Moreover, we incorporate the proposed technique in cube pruning algorithm for faster decoding. We notice more improvement in this approach.
机译:传统的解码算法对于任何统计机器翻译系统的成功都是至关重要的。提供大量空间会导致不适当的缓慢解码。在翻译精度和解码速度之间需要权衡。修剪算法(例如直方图修剪,阈值修剪)正在尝试对此进行优化。修剪算法对补充参数(即堆栈大小,波束阈值)有预先定义的限制,有助于改善翻译质量并加快解码器的速度。但是,相同的参数值不能在最佳时间内提供定性转换。这些堆栈大小和光束阈值应根据文本的结构进行更改。在本文中,我们使用基于机器学习的方法,基于文本结构和特征,确定最佳的堆栈大小和光束阈值运行时间。然后,将这些参数的值应用于波束搜索算法以进行解码。最后,我们在资源匮乏的亚洲语言上进行的实验显示,在翻译准确性和解码时间方面,它们的性能有了显着提高。对于我们的各种实验,将HindEnCorp和ILCI数据集用作英语-印地语,印地语-马拉地语,印地语-康卡尼语,孟加拉语-印地语对的基准数据集。此外,我们将提出的技术结合到立方修剪算法中以加快解码速度。我们注意到此方法有更多改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号