首页> 外文期刊>IEICE Transactions on Information and Systems >A Study on Acoustic Modeling for Speech Recognition of Predominantly Monosyllabic Languages
【24h】

A Study on Acoustic Modeling for Speech Recognition of Predominantly Monosyllabic Languages

机译:主要用于单音节语言的语音识别的声学模型研究

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a study on acoustic modeling for speech recognition of predominantly monosyllabic languages. Various speech units used in speech recognition systems have been investigated. To evaluate the effectiveness of these acoustic models, the Thai language is selected, since it is a predominantly monosyllabic language and has a complex vowel system. Several experiments have been carried out to find the proper speech unit that can accurately create acoustic model and give a higher recognition rate. Results of recognition rates under different acoustic models are given and compared. In addition, this paper proposes a new speech unit for speech recognition, namely onset-rhyme unit. Two models are proposed-the Phonotactic Onset-Rhyme Model (PORM) and the Contextual Onset-Rhyme Model (CORM). The models comprise a pair of onset and rhyme units, which makes up a syllable. An onset comprises an initial consonant and its transition towards the following vowel. Together with the onset, the rhyme consists of a steady vowel segment and a final consonant. Experimental results show that the onset-rhyme model improves on the efficiency of other speech units. The onset-rhyme model improves on the accuracy of the inter-syllable triphone model by nearly 9.3% and of the context-dependent Initial-Final model by nearly 4.7% for the speaker-dependent systems using only an acoustic model, and 5.6% and 4.5% for the speaker-dependent systems using both acoustic and language model respectively. The results show that the onset-rhyme models attain a high recognition rate. Moreover, they also give more efficiency in terms of system complexity.
机译:本文提出了一种主要用于单音节语言语音识别的声学模型研究。已经研究了语音识别系统中使用的各种语音单元。为了评估这些声学模型的有效性,选择了泰语,因为它主要是单音节语言,并且具有复杂的元音系统。已经进行了几次实验以找到可以准确地创建声学模型并给出更高识别率的正确语音单元。给出并比较了不同声学模型下的识别率结果。此外,本文提出了一种用于语音识别的新语音单元,即起韵单元。提出了两个模型-音韵起义韵律模型(PORM)和情境起义韵律模型(CORM)。这些模型包括一对起音单元和押韵单元,它们组成一个音节。词首包含一个初始辅音,并过渡到下一个元音。伴随起音,韵律由稳定的元音段和最后的辅音组成。实验结果表明,声韵模型提高了其他语音单元的效率。对于仅使用声学模型的扬声器相关系统,音韵模型的音节间三音素模型的精度提高了近9.3%,上下文相关的初始最终模型的精度提高了约4.7%。分别使用声学和语言模型的与说话者相关的系统占4.5%。结果表明,押韵模型具有较高的识别率。而且,它们在系统复杂性方面也提供了更高的效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号