Speaker Adaptive Training Localizing Speaker Modules in DNN for Hybrid DNN-HMM Speech Recognizers

Tsubasa OCHIAI; Shigeki MATSUDA; Hideyuki WATANABE; Xugang LU; Chiori HORI; Hisashi KAWAI; Shigeru KATAGIRI

首页> 外文期刊>IEICE transactions on information and systems >Speaker Adaptive Training Localizing Speaker Modules in DNN for Hybrid DNN-HMM Speech Recognizers

【24h】

Speaker Adaptive Training Localizing Speaker Modules in DNN for Hybrid DNN-HMM Speech Recognizers

机译：混合DNN-HMM语音识别器中DNN中的说话人自适应训练本地化说话人模块

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Among various training concepts for speaker adaptation, Speaker Adaptive Training (SAT) has been successfully applied to a standard Hidden Markov Model (HMM) speech recognizer, whose state is associated with Gaussian Mixture Models (GMMs). On the other hand, focusing on the high discriminative power of Deep Neural Networks (DNNs), a new type of speech recognizer structure, which combines DNNs and HMMs, has been vigorously investigated in the speaker adaptation research field. Along these two lines, it is natural to conceive of further improvement to a DNN-HMM recognizer by employing the training concept of SAT. In this paper, we propose a novel speaker adaptation scheme that applies SAT to a DNN-HMM recognizer. Our SAT scheme allocates a Speaker Dependent (SD) module to one of the intermediate layers of DNN, treats its remaining layers as a Speaker Independent (SI) module, and jointly trains the SD and SI modules while switching the SD module in a speaker-by-speaker manner. We implement the scheme using a DNN-HMM recognizer, whose DNN has seven layers, and elaborate its utility over TED Talks corpus data. Our experimental results show that in the supervised adaptation scenario, our Speaker-Adapted (SA) SAT-based recognizer reduces the word error rate of the baseline SI recognizer and the lowest word error rate of the SA SI recognizer by 8.4% and 0.7%, respectively, and by 6.4% and 0.6% in the unsupervised adaptation scenario. The error reductions gained by our SA-SAT-based recognizers proved to be significant by statistical testing. The results also show that our SAT-based adaptation outperforms, regardless of the SD module layer selection, its counterpart SI-based adaptation, and that the inner layers of DNN seem more suitable for SD module allocation than the outer layers.

机译：在用于说话人适应的各种训练概念中，说话人适应训练（SAT）已成功应用于标准的隐马尔可夫模型（HMM）语音识别器，其状态与高斯混合模型（GMM）相关。另一方面，着眼于深度神经网络（DNN）的高判别力，在说话人适应性研究领域中，对DNN和HMM相结合的新型语音识别器结构进行了深入研究。沿着这两条路线，自然会想到通过采用SAT的训练概念进一步改进DNN-HMM识别器。在本文中，我们提出了一种新颖的说话人自适应方案，该方案将SAT应用于DNN-HMM识别器。我们的SAT方案为DNN的中间层之一分配了一个说话人相关（SD）模块，将其其余层视为一个独立于说话人的（SI）模块，并联合训练SD和SI模块，同时在说话者中切换SD模块-讲者的方式。我们使用DNN-HMM识别器（其DNN具有七个层）来实现该方案，并详细说明其对TED Talks语料库数据的实用性。我们的实验结果表明，在有监督适应的情况下，我们基于说话人自适应（SA）SAT的识别器将基线SI识别器的字错误率降低了，并将SA SI识别器的最低字错误率降低了8.4％和0.7％，在无人监督的适应方案中分别增加了6.4％和0.6％。通过统计测试，我们基于SA-SAT的识别器所减少的错误非常明显。结果还表明，无论选择SD模块的层，基于SI的对应层，我们基于SAT的自适应性能都好，并且DNN的内层似乎比外层更适合SD模块分配。

著录项

来源
《IEICE transactions on information and systems》 |2016年第10期|共13页
作者
Tsubasa OCHIAI; Shigeki MATSUDA; Hideyuki WATANABE; Xugang LU; Chiori HORI; Hisashi KAWAI; Shigeru KATAGIRI;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Determining speaker attributes from stress-affected speech in emergency situations with hybrid SVM-DNN architecture [J] . Ahmad Jamil, Sajjad Muhammad, Rho Seungmin, Multimedia Tools and Applications . 2018,第4期

机译：使用混合SVM-DNN架构从紧急情况下受压力影响的语音确定说话者属性
2. Using speaker adaptive training to realize Mandarin-Tibetan cross-lingual speech synthesis [J] . Yang Hongwu, Oura Keiichiro, Wang Haiyan, Multimedia Tools and Applications . 2015,第22期

机译：利用说话者自适应训练来实现汉语-藏语跨语言语音合成
3. Average-Voice-Based Speech Synthesis Using HSMM-Based Speaker Adaptation and Adaptive Training [J] . Junichi YAMAGISHI, Takao KOBAYASHI IEICE Transactions on Information and Systems . 2007,第2期

机译：基于HSMM的说话人自适应和自适应训练的基于平均语音的语音合成
4. Bottleneck linear transformation network adaptation for speaker adaptive training-based hybrid DNN-HMM speech recognizer [C] . Tsubasa Ochiai, Shigeki Matsuda, Hideyuki Watanabe, IEEE International Conference on Acoustics, Speech and Signal Processing . 2016

机译：基于说话人自适应训练的混合DNN-HMM语音识别器的瓶颈线性变换网络自适应
5. Development and Effectiveness of Online Intonation Training Modules to Improve Chinese Speakers' English Speech [D] . Jiang, Yan. 2017

机译：在线语调培训模块的开发和有效性提高中国扬声器英语演讲
6. How the Human Brain Recognizes Speech in the Context of Changing Speakers [O] . Katharina von Kriegstein, David R. R. Smith, Roy D. Patterson, 2010

机译：在说话者不断变化的背景下人脑如何识别语音
7. A Speaker Adaptive DNN Training Approach for Speaker-Independent Acoustic Inversion [O] . Leonardo Badino, Luca Franceschi, Raman Arora, 2017

机译：扬声器无关声反转的扬声器自适应DNN培训方法
8. Channel Compensation for Speaker Recognition using MAP Adapted PLDA and Denoising DNNs. [R] . Richardson, F. S., Reynolds, D. A., Nemsick, B. 2016

机译：使用map自适应pLDa和去噪DNN进行说话人识别的信道补偿。

Speaker Adaptive Training Localizing Speaker Modules in DNN for Hybrid DNN-HMM Speech Recognizers

摘要

著录项

相似文献

相关主题

期刊订阅