...
首页> 外文期刊>Pattern recognition letters >The scaling problem in the pattern recognition approach to machine translation
【24h】

The scaling problem in the pattern recognition approach to machine translation

机译:机器翻译模式识别方法中的缩放问题

获取原文
获取原文并翻译 | 示例
           

摘要

Statistical machine translation (SMT) has proven to be an interesting pattern recognition framework for automatically building machine translations systems from available parallel corpora. In the last few years, research in SMT has been characterized by two significant advances. First, the popularization of the so called phrase-based statistical translation models, which allows to incorporate local contextual information to the translation models. Second, the availability of larger and larger parallel corpora, which are composed of millions of sentence pairs, and tens of millions of running words. Since phrase-based models basically consists in statistical dictionaries of phrase pairs, their estimation from very large corpora is a very costly task that yields a huge number of parameters which are to be stored in memory. The handling of millions of model parameters and a similar number of training samples have become a bottleneck in the field of SMT, as well as in other well-known pattern recognition tasks such as speech recognition or handwritten recognition, just to name a few. In this paper, we propose a general framework that deals with the scaling problem in SMT without introducing significant time overhead by means of the combination of different scaling techniques. This new framework is based on the use of counts instead of probabilities, and on the concept of cache memory.
机译:统计机器翻译(SMT)已被证明是一种有趣的模式识别框架,用于从可用的并行语料库自动构建机器翻译系统。在过去的几年中,SMT的研究具有两项重要的进步。首先,所谓的基于短语的统计翻译模型的普及,这允许将本地上下文信息合并到翻译模型中。第二,越来越大的并行语料库的可用性,该语料库由数百万个句子对和数千万个运行中的单词组成。由于基于短语的模型基本上由短语对的统计字典组成,因此从非常大的语料库进行估计是一项非常昂贵的任务,需要产生大量要存储在内存中的参数。数以百万计的模型参数和类似数量的训练样本的处理已成为SMT领域以及其他知名模式识别任务(例如语音识别或手写识别)的瓶颈,仅举几例。在本文中,我们提出了一个通用框架,该框架可解决SMT中的缩放问题,而不会通过组合不同的缩放技术而引入大量时间开销。这个新框架基于计数而不是概率的使用,以及高速缓存的概念。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号