A Study on Acoustic Modeling for Speech Recognition of Predominantly Monosyllabic Languages

Ekkarit MANEENOI; Visarut AHKUPUTRA; Sudaporn LUKSANEEYANAWIN; Somchai JITAPUNKUL

首页> 外文期刊>IEICE Transactions on Information and Systems >A Study on Acoustic Modeling for Speech Recognition of Predominantly Monosyllabic Languages

【24h】

A Study on Acoustic Modeling for Speech Recognition of Predominantly Monosyllabic Languages

机译：主要用于单音节语言的语音识别的声学模型研究

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a study on acoustic modeling for speech recognition of predominantly monosyllabic languages. Various speech units used in speech recognition systems have been investigated. To evaluate the effectiveness of these acoustic models, the Thai language is selected, since it is a predominantly monosyllabic language and has a complex vowel system. Several experiments have been carried out to find the proper speech unit that can accurately create acoustic model and give a higher recognition rate. Results of recognition rates under different acoustic models are given and compared. In addition, this paper proposes a new speech unit for speech recognition, namely onset-rhyme unit. Two models are proposed-the Phonotactic Onset-Rhyme Model (PORM) and the Contextual Onset-Rhyme Model (CORM). The models comprise a pair of onset and rhyme units, which makes up a syllable. An onset comprises an initial consonant and its transition towards the following vowel. Together with the onset, the rhyme consists of a steady vowel segment and a final consonant. Experimental results show that the onset-rhyme model improves on the efficiency of other speech units. The onset-rhyme model improves on the accuracy of the inter-syllable triphone model by nearly 9.3% and of the context-dependent Initial-Final model by nearly 4.7% for the speaker-dependent systems using only an acoustic model, and 5.6% and 4.5% for the speaker-dependent systems using both acoustic and language model respectively. The results show that the onset-rhyme models attain a high recognition rate. Moreover, they also give more efficiency in terms of system complexity.

机译：本文提出了一种主要用于单音节语言语音识别的声学模型研究。已经研究了语音识别系统中使用的各种语音单元。为了评估这些声学模型的有效性，选择了泰语，因为它主要是单音节语言，并且具有复杂的元音系统。已经进行了几次实验以找到可以准确地创建声学模型并给出更高识别率的正确语音单元。给出并比较了不同声学模型下的识别率结果。此外，本文提出了一种用于语音识别的新语音单元，即起韵单元。提出了两个模型-音韵起义韵律模型（PORM）和情境起义韵律模型（CORM）。这些模型包括一对起音单元和押韵单元，它们组成一个音节。词首包含一个初始辅音，并过渡到下一个元音。伴随起音，韵律由稳定的元音段和最后的辅音组成。实验结果表明，声韵模型提高了其他语音单元的效率。对于仅使用声学模型的扬声器相关系统，音韵模型的音节间三音素模型的精度提高了近9.3％，上下文相关的初始最终模型的精度提高了约4.7％。分别使用声学和语言模型的与说话者相关的系统占4.5％。结果表明，押韵模型具有较高的识别率。而且，它们在系统复杂性方面也提供了更高的效率。

著录项

来源
《IEICE Transactions on Information and Systems》 |2004年第5期|p.1146-1163|共18页
作者
Ekkarit MANEENOI; Visarut AHKUPUTRA; Sudaporn LUKSANEEYANAWIN; Somchai JITAPUNKUL;
展开▼
作者单位

Digital Signal Processing Research Laboratory, Department of Electrical Engineering, Faculty of Engineering, Chulalongkorn University, Phaya Thai Road, Bangkok 10330, Thailand;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;
关键词
acoustic modeling; continuous speech recognition; onset-rhyme model; predominantly monosyllabic languages; thai speech recognition;

机译：声学建模;连续语音识别;起韵模型;主要是单音节语言;泰语语音识别;

相似文献

外文文献
中文文献
专利

1. Language-independent and language-adaptive acoustic modeling for speech recognition [J] . Tanja Schultz, Alex Waibel Speech Communication . 2001,第1a2期

机译：独立于语言和语言自适应的声学模型用于语音识别
2. Speech Recognition Based on Unified Model of Acoustic and Language Aspects of Speech [J] . Yotaro Kubo, Atsunori Ogawa, Takaaki Hori, NTT Technical Review . 2013,第12期

机译：基于语音的语言和语言方面统一模型的语音识别
3. Retrospective Analysis of Clinical Performance of an Estonian Speech Recognition System for Radiology: Effects of Different Acoustic and Language Models [J] . Paats A., Alumae T., Meister E., Journal of digital imaging: the official journal of the Society for Computer Applications in Radiology . 2018,第5期

机译：回顾性分析了一种放射学的爱沙尼亚语音识别系统的临床表现：不同声学和语言模型的影响
4. Creating language and acoustic models using Kaldi to build an automatic speech recognition system for Kannada language [C] . Yadava G Thimmaraja, H S Jayanna 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information amp; Communication Technology . 2017

机译：使用Kaldi创建语言和声学模型，以构建针对卡纳达语的自动语音识别系统
5. Robust Acoustic Modeling and Front-End Design for Distant Speech Recognition [D] . Mirsamadi, Seyedmahdad. 2017

机译：鲁棒的声学建模和远端语音识别前端设计
6. Retrospective Analysis of Clinical Performance of an Estonian Speech Recognition System for Radiology: Effects of Different Acoustic and Language Models [O] . A. Paats, T. Alumäe, E. Meister, 2018

机译：一项爱沙尼亚放射线语音识别系统临床表现的回顾性分析：不同声学和语言模型的影响
7. End-to-End Speech Endpoint Detection Utilizing Acoustic and Language Modeling Knowledge for Online Low-Latency Speech Recognition [O] . Inyoung Hwang, Joon-Hyuk Chang 2020

机译：利用声学和语言建模知识进行在线低延迟语音识别的端到端语音端点检测

A Study on Acoustic Modeling for Speech Recognition of Predominantly Monosyllabic Languages

摘要

著录项

相似文献

相关主题

期刊订阅