首页> 外文期刊>IEICE transactions on information and systems >Speaker Adaptive Training Localizing Speaker Modules in DNN for Hybrid DNN-HMM Speech Recognizers
【24h】

Speaker Adaptive Training Localizing Speaker Modules in DNN for Hybrid DNN-HMM Speech Recognizers

机译:混合DNN-HMM语音识别器中DNN中的说话人自适应训练本地化说话人模块

获取原文
       

摘要

Among various training concepts for speaker adaptation, Speaker Adaptive Training (SAT) has been successfully applied to a standard Hidden Markov Model (HMM) speech recognizer, whose state is associated with Gaussian Mixture Models (GMMs). On the other hand, focusing on the high discriminative power of Deep Neural Networks (DNNs), a new type of speech recognizer structure, which combines DNNs and HMMs, has been vigorously investigated in the speaker adaptation research field. Along these two lines, it is natural to conceive of further improvement to a DNN-HMM recognizer by employing the training concept of SAT. In this paper, we propose a novel speaker adaptation scheme that applies SAT to a DNN-HMM recognizer. Our SAT scheme allocates a Speaker Dependent (SD) module to one of the intermediate layers of DNN, treats its remaining layers as a Speaker Independent (SI) module, and jointly trains the SD and SI modules while switching the SD module in a speaker-by-speaker manner. We implement the scheme using a DNN-HMM recognizer, whose DNN has seven layers, and elaborate its utility over TED Talks corpus data. Our experimental results show that in the supervised adaptation scenario, our Speaker-Adapted (SA) SAT-based recognizer reduces the word error rate of the baseline SI recognizer and the lowest word error rate of the SA SI recognizer by 8.4% and 0.7%, respectively, and by 6.4% and 0.6% in the unsupervised adaptation scenario. The error reductions gained by our SA-SAT-based recognizers proved to be significant by statistical testing. The results also show that our SAT-based adaptation outperforms, regardless of the SD module layer selection, its counterpart SI-based adaptation, and that the inner layers of DNN seem more suitable for SD module allocation than the outer layers.
机译:在用于说话人适应的各种训练概念中,说话人适应训练(SAT)已成功应用于标准的隐马尔可夫模型(HMM)语音识别器,其状态与高斯混合模型(GMM)相关。另一方面,着眼于深度神经网络(DNN)的高判别力,在说话人适应性研究领域中,对DNN和HMM相结合的新型语音识别器结构进行了深入研究。沿着这两条路线,自然会想到通过采用SAT的训练概念进一步改进DNN-HMM识别器。在本文中,我们提出了一种新颖的说话人自适应方案,该方案将SAT应用于DNN-HMM识别器。我们的SAT方案为DNN的中间层之一分配了一个说话人相关(SD)模块,将其其余层视为一个独立于说话人的(SI)模块,并联合训练SD和SI模块,同时在说话者中切换SD模块-讲者的方式。我们使用DNN-HMM识别器(其DNN具有七个层)来实现该方案,并详细说明其对TED Talks语料库数据的实用性。我们的实验结果表明,在有监督适应的情况下,我们基于说话人自适应(SA)SAT的识别器将基线SI识别器的字错误率降低了,并将SA SI识别器的最低字错误率降低了8.4%和0.7%,在无人监督的适应方案中分别增加了6.4%和0.6%。通过统计测试,我们基于SA-SAT的识别器所减少的错误非常明显。结果还表明,无论选择SD模块的层,基于SI的对应层,我们基于SAT的自适应性能都好,并且DNN的内层似乎比外层更适合SD模块分配。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号