首页> 外文期刊>Computer speech and language >Mandarin–English Information (MEI): investigating translingual speech retrieval
【24h】

Mandarin–English Information (MEI): investigating translingual speech retrieval

机译:普通话-英语信息(MEI):调查翻译跨语言语音

获取原文
获取原文并翻译 | 示例
           

摘要

This paper describes the Mandarin–English Information (MEI) project, where we investigated the problem of cross-language spoken document retrieval (CL-SDR), and developed one of the first English–Chinese CL-SDR systems. Our system accepts an entire English news story (text) as query, and retrieves relevant Chinese broadcast news stories (audio) from the document collection. Hence, this is a cross-language and cross-media retrieval task. We applied a multi-scale approach to our problem, which unifies the use of phrases, words and subwords in retrieval. The English queries are translated into Chinese by means of a dictionary-based approach, where we have integrated phrase-based translation with word-by-word translation. Untranslatable named entities are transliterated by a novel subword translation technique. The multi-scale approach can be divided into three subtasks – multi-scale query formulation, multi-scale audio indexing (by speech recognition) and multi-scale retrieval. Experimental results demonstrate that the use of phrase-based translation and subword translation gave performance gains, and multi-scale retrieval outperforms word-based retrieval.
机译:本文介绍了汉语-英语信息(MEI)项目,我们在该项目中研究了跨语言语音文档检索(CL-SDR)的问题,并开发了第一个英语-汉语CL-SDR系统。我们的系统接受整个英语新闻报道(文本)作为查询,并从文档集中检索相关的中文广播新闻报道(音频)。因此,这是一种跨语言和跨媒体的检索任务。我们对问题应用了多尺度方法,该方法统一了短语,单词和子单词在检索中的使用。英文查询通过基于字典的方法翻译成中文,其中我们将基于短语的翻译与逐词翻译集成在一起。不可翻译的命名实体通过一种新颖的子词翻译技术进行音译。多尺度方法可以分为三个子任务-多尺度查询表述,多尺度音频索引(通过语音识别)和多尺度检索。实验结果表明,基于短语的翻译和子词翻译的使用可提高性能,并且多尺度检索的性能优于基于词的检索。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号