首页> 外文会议>International conference on text, speech, and dialogue >A Comparison of Hybrid and End-to-End Models for Syllable Recognition
【24h】

A Comparison of Hybrid and End-to-End Models for Syllable Recognition

机译:音节识别的混合模型与端到端模型的比较

获取原文

摘要

This paper presents a comparison of a traditional hybrid speech recognition system (kaldi using WFST and TDNN with lattice-free MMI) and a lexicon-free end-to-end (TensorFlow implementation of multi-layer LSTM with CTC training) models for German syllable recognition on the Verbrnobil corpus. The results show that explicitly modeling prior knowledge is still valuable in building recognition systems. With a strong language model (LM) based on syllables, the structured approach significantly outperforms the end-to-end model. The best word error rate (WFR) regarding syllables was achieved using kaldi with a 4-gram LM, modeling all syllables observed in the training set. It achieved 10.0% WER w.r.t. the syllables, compared to the end-to-end approach where the best WER was 27.53%. The work presented here has implications for building future recognition systems that operate independent, of a large vocabulary, as typically used in a tasks such as recognition of syllabic or agglutinative languages, out-of-vocabulary techniques, keyword search indexing and medical speech processing.
机译:本文介绍了针对德国音节的传统混合语音识别系统(使用WFST和TDNN和无格MMI的kaldi)和无词典的端到端(带有CTC训练的多层LSTM的TensorFlow实现)模型的比较。对Verbrnobil语料库的认可。结果表明,对先验知识进行显式建模在构建识别系统中仍然很有价值。借助基于音节的强大语言模型(LM),结构化方法的性能明显优于端到端模型。使用带有4克LM的kaldi,对在训练集中观察到的所有音节进行建模,可以获得有关音节的最佳字错误率(WFR)。其WER达到10.0%w.r.t.相比于端到端的方法,最佳WER为27.53%。此处提出的工作对于构建独立于大词汇量的未来识别系统具有影响,该识别系统通常用于诸如音节或凝集性语言识别,语音以外的技术,关键字搜索索引和医学语音处理等任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号