...
首页> 外文期刊>Machine Translation >Survey of data-selection methods in statistical machine translation
【24h】

Survey of data-selection methods in statistical machine translation

机译:统计机器翻译中的数据选择方法概述

获取原文
获取原文并翻译 | 示例
           

摘要

Statistical machine translation has seen significant improvements in quality over the past several years. The single biggest factor in this improvement has been the accumulation of ever larger stores of data. We now find ourselves, however, the victims of our own success, in that it has become increasingly difficult to train on such large sets of data, due to limitations in memory, processing power, and ultimately, speed (i.e. data-to-models takes an inordinate amount of time). Moreover, the training data has a wide quality spectrum. A variety of methods for data cleaning and data selection have been developed to address these issues. Each of these methods employs a search or filtering algorithm to select a subset of the data, given a defined set of feature functions. In this paper we provide a comparative overview of research in this area based on application scenario, feature functions and search method.
机译:在过去的几年中,统计机器翻译的质量有了显着提高。此项改进的最大因素是越来越多的数据存储。但是,由于内存,处理能力以及最终速度(即数据到模型)的限制,我们现在发现自己是成功的受害者,因为在如此庞大的数据集上进行训练变得越来越困难需要花费大量时间)。此外,训练数据具有广泛的质量范围。为了解决这些问题,已经开发了多种用于数据清理和数据选择的方法。给定一组定义的特征函数,这些方法中的每一种都采用搜索或过滤算法来选择数据的子集。本文根据应用场景,特征功能和搜索方法,对该领域的研究进行了比较概述。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号