【24h】

Innovations in Parallel Corpus Search Tools

机译:并行语料库搜索工具的创新

获取原文

摘要

Recent years have seen an increased interest in and availability of parallel corpora. Large corpora from international organizations (e.g. European Union, United Nations, European Patent Office), or from multilingual Internet sites (e.g. OpenSubtitles) are now easily available and are used for statistical machine translation but also for online search by different user groups. This paper gives an overview of different usages and different types of search systems. In the past, parallel corpus search systems were based on sentence-aligned corpora. We argue that automatic word alignment allows for major innovations in searching parallel corpora. Some online query systems already employ word alignment for sorting translation variants, but none supports the full query functionality that has been developed for parallel treebanks. We propose to develop such a system for efficiently searching large parallel corpora with a powerful query language.
机译:近年来,人们对并行语料库的兴趣和可用性越来越高。现在可以轻松获得来自国际组织(例如,欧盟,联合国,欧洲专利局)或来自多语言互联网站点(例如,OpenSubtitles)的大型语料库,它们可用于统计机器翻译,也可用于不同用户群体的在线搜索。本文概述了不同的用法和不同类型的搜索系统。过去,并行语料库搜索系统基于句子对齐的语料库。我们认为自动单词对齐允许在搜索并行语料库方面进行重大创新。一些在线查询系统已经使用单词对齐来对翻译变体进行排序,但是没有一个系统支持为并行树库开发的完整查询功能。我们建议开发这样一种系统,以使用强大的查询语言有效地搜索大型并行语料库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号