首页> 外文会议>6th Workshop on statistical machine translation 2011. >Productive Generation of Compound Words in Statistical Machine Translation
【24h】

Productive Generation of Compound Words in Statistical Machine Translation

机译:统计机器翻译中复合词的高效生成

获取原文
获取原文并翻译 | 示例

摘要

In many languages the use of compound words is very productive. A common practice to reduce sparsity consists in splitting compounds in the training data. When this is done, the system incurs the risk of translating components in non-consecutive positions, or in the wrong order. Furthermore, a post-processing step of compound merging is required to reconstruct compound words in the output. We present a method for increasing the chances that components that should be merged are translated into contiguous positions and in the right order. We also propose new heuristic methods for merging components that outperform all known methods, and a learning-based method that has similar accuracy as the heuristic method, is better at producing novel compounds, and can operate with no background linguistic resources.
机译:在许多语言中,复合词的使用非常有效。减少稀疏性的常见做法是在训练数据中拆分化合物。完成此操作后,系统将冒着以非连续位置或错误顺序转换组件的风险。此外,需要复合合并的后处理步骤来重构输出中的复合词。我们提出了一种增加应合并组件以正确顺序转换为连续位置的机会的方法。我们还提出了新的启发式方法用于合并组件,这些方法的性能优于所有已知方法,并且基于学习的方法的准确性与启发式方法相似,更擅长生成新型化合物,并且无需背景语言资源即可操作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号