首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing;ICASSP >Improving language models for ASR using translated in-domain data
【24h】

Improving language models for ASR using translated in-domain data

机译:使用翻译后的域内数据改善ASR的语言模型

获取原文

摘要

Acquisition of in-domain training data to build speech recognition systems for under-resourced languages can be a costly, time-demanding and tedious process. In this work, we propose the use of machine translation to translate English transcripts of telephone speech into Czech language in order to improve a Czech CTS speech recognition system. The translated transcripts are used as additional language model training data in a scenario where the baseline language model is trained on off- and close-domain data only. We report perplexities, OOV and word error rates and examine different data sets and translators on their suitability for the described task.
机译:获取域内训练数据以为资源不足的语言构建语音识别系统可能是一个昂贵,耗时且乏味的过程。在这项工作中,我们建议使用机器翻译将电话语音的英语成绩单翻译成捷克语,以改善捷克CTS语音识别系统。在仅对域外和近域数据进行基线语言模型训练的情况下,翻译的成绩单将用作其他语言模型训练数据。我们报告了困惑,OOV和单词错误率,并检查了不同的数据集和翻译程序对所描述任务的适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号