A Comparison of Hybrid and End-to-End Models for Syllable Recognition

机译：音节识别的混合模型与端到端模型的比较

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a comparison of a traditional hybrid speech recognition system (kaldi using WFST and TDNN with lattice-free MMI) and a lexicon-free end-to-end (TensorFlow implementation of multi-layer LSTM with CTC training) models for German syllable recognition on the Verbrnobil corpus. The results show that explicitly modeling prior knowledge is still valuable in building recognition systems. With a strong language model (LM) based on syllables, the structured approach significantly outperforms the end-to-end model. The best word error rate (WFR) regarding syllables was achieved using kaldi with a 4-gram LM, modeling all syllables observed in the training set. It achieved 10.0% WER w.r.t. the syllables, compared to the end-to-end approach where the best WER was 27.53%. The work presented here has implications for building future recognition systems that operate independent, of a large vocabulary, as typically used in a tasks such as recognition of syllabic or agglutinative languages, out-of-vocabulary techniques, keyword search indexing and medical speech processing.

机译：本文介绍了针对德国音节的传统混合语音识别系统（使用WFST和TDNN和无格MMI的kaldi）和无词典的端到端（带有CTC训练的多层LSTM的TensorFlow实现）模型的比较。对Verbrnobil语料库的认可。结果表明，对先验知识进行显式建模在构建识别系统中仍然很有价值。借助基于音节的强大语言模型（LM），结构化方法的性能明显优于端到端模型。使用带有4克LM的kaldi，对在训练集中观察到的所有音节进行建模，可以获得有关音节的最佳字错误率（WFR）。其WER达到10.0％w.r.t.相比于端到端的方法，最佳WER为27.53％。此处提出的工作对于构建独立于大词汇量的未来识别系统具有影响，该识别系统通常用于诸如音节或凝集性语言识别，语音以外的技术，关键字搜索索引和医学语音处理等任务。

著录项

来源
《International conference on text, speech, and dialogue》|2019年|352-360|共9页
会议地点 Ljubljana(SI)
作者
Sebastian P. Bayerl; Korbinian Riedhammer;
展开▼
作者单位

Technische Hochschule Nuernberg Georg Simon Ohm Nuremberg Germany;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Speech recognition; Language model; CTC; End-2-End; Syllables;

机译：语音识别;语言模型； CTC； End-2-End;音节;

相似文献

外文文献
中文文献
专利

1. Comparison of Syllable and Phoneme Modelling of Agglutinative Tamil Isolated Words in Speech Recognition [J] . Ibralebbe Mohamed Kalith, David Asirvatham, Ali Khatibi, British Journal of Applied Science and Technology . 2018,第4期

机译：语音识别中泰米尔语孤立单词的音节和音素建模比较
2. Syllable language models for Mandarin speech recognition: Exploiting character language models [J] . Liu X., Hieronymus J.L., Gales M.J.F., The Journal of the Acoustical Society of America . 2013,第1期

机译：普通话语音识别的音节语言模型：利用字符语言模型
3. Context-dependent Syllable Modeling of Sentence-based Semi-continuous Speech Recognition for the Tamil Language [J] . Ibralebbe Mohamed Kalith, David Asirvatham, Ismail Raisal Information Technology Journal . 2017,第3期

机译：基于句子的泰米尔语语言半连续语音识别的上下文依赖音节建模
4. A Comparison of Hybrid and End-to-End Models for Syllable Recognition [C] . Sebastian P. Bayerl, Korbinian Riedhammer International conference on text, speech, and dialogue . 2019

机译：混合和端到端模型进行音节识别的比较
5. Tonal syllable recognition for continuous Mandarin using phonetic models. [D] . Wu, Jiang. 2014

机译：使用语音模型对连续普通话进行音调识别。
6. Unsupervised End-to-End Deep Model for Newborn and Infant Activity Recognition [O] . Kyungkoo Jun, Soonpil Choi 2020

机译：新生儿和婴儿活动识别的无监督端到端深度模型
7. A Comparison of Hybrid and End-to-End Models for Syllable Recognition [O] . Sebastian P. Bayerl, Korbinian Riedhammer 2019

机译：混合和端到端模型进行音节识别的比较

A Comparison of Hybrid and End-to-End Models for Syllable Recognition

摘要

著录项

相似文献

相关主题

期刊订阅