Survey of data-selection methods in statistical machine translation

Sauleh Eetemadi; William Lewis; Kristina Toutanova; Hayder Radha

首页> 外文期刊>Machine Translation >Survey of data-selection methods in statistical machine translation

【24h】

Survey of data-selection methods in statistical machine translation

机译：统计机器翻译中的数据选择方法概述

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Statistical machine translation has seen significant improvements in quality over the past several years. The single biggest factor in this improvement has been the accumulation of ever larger stores of data. We now find ourselves, however, the victims of our own success, in that it has become increasingly difficult to train on such large sets of data, due to limitations in memory, processing power, and ultimately, speed (i.e. data-to-models takes an inordinate amount of time). Moreover, the training data has a wide quality spectrum. A variety of methods for data cleaning and data selection have been developed to address these issues. Each of these methods employs a search or filtering algorithm to select a subset of the data, given a defined set of feature functions. In this paper we provide a comparative overview of research in this area based on application scenario, feature functions and search method.

机译：在过去的几年中，统计机器翻译的质量有了显着提高。此项改进的最大因素是越来越多的数据存储。但是，由于内存，处理能力以及最终速度（即数据到模型）的限制，我们现在发现自己是成功的受害者，因为在如此庞大的数据集上进行训练变得越来越困难需要花费大量时间）。此外，训练数据具有广泛的质量范围。为了解决这些问题，已经开发了多种用于数据清理和数据选择的方法。给定一组定义的特征函数，这些方法中的每一种都采用搜索或过滤算法来选择数据的子集。本文根据应用场景，特征功能和搜索方法，对该领域的研究进行了比较概述。

著录项

来源
《Machine Translation》 |2015年第4期|189-223|共35页
作者
Sauleh Eetemadi; William Lewis; Kristina Toutanova; Hayder Radha;
展开▼
作者单位

Michigan State University">(1);

Microsoft Research">(2);

Microsoft Research">(2);

Microsoft Research">(2);

Michigan State University">(1);

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Statistical machine translation; Data selection; Data cleaning; Literature overview;

机译：统计机器翻译;数据选择;数据清理;文献概述;

相似文献

外文文献
中文文献
专利

1. Survey of data-selection methods in statistical machine translation [J] . Sauleh Eetemadi, William Lewis, Kristina Toutanova, Machine translation . 2015,第3a4期

机译：统计机器翻译中的数据选择方法概述
2. Patent Issued for Methods for Using Manual Phrase Alignment Data to Generate Translation Models for Statistical Machine Translation [J] . Robotics and Machine Learning . 2012,第32期

机译：使用手动短语对齐数据来生成用于统计机器翻译的翻译模型的方法已颁发专利
3. Function words in statistical machine-translated Chinese and original Chinese: A study into the translationese of machine translation systems [J] . Kuo Chen-li Digital scholarship in the humanities . 2019,第4期

机译：统计机器中的功能词 - 翻译的中国和原版中文：一项研究机器翻译系统的研究
4. Transductive Data-Selection Algorithms for Fine-Tuning Neural Machine Translation [C] . Alberto Poncelas, Gideon Maillette de Buy Wenniger, Andy Way Machine translation summit;Workshop on patent and scientific literature translation . 2019

机译：用于优化神经机器翻译的转导数据选择算法
5. Modeling, Relevance in Statistical Machine Translation: Scoring Aligment, Context, and Annotations of Translation Instances. [D] . Phillips, Aaron B. 2012

机译：统计机器翻译中的建模，相关性：评分实例，上下文和翻译实例注释。
6. 3145 An Evaluation of Machine Learning and Traditional Statistical Methods for Discovery in Large-Scale Translational Data [O] . Megan C Hollister, Jeffrey D. Blume 2019

机译：3145对机器学习和传统统计方法的评估以发现大规模翻译数据
7. Machine translation and Welsh: analysing free statistical machine translation for the professional translation of an under-researched language pair [O] . Screen Benjamin 2017

机译：机器翻译和威尔士语：分析免费的统计机器翻译，以对未充分研究的语言对进行专业翻译

Survey of data-selection methods in statistical machine translation

摘要

著录项

相似文献

相关主题

期刊订阅