Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition

Xue S.; Abdel-Hamid O.; Jiang H.; Dai L.; Liu Q.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition

【24h】

Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition

机译：基于判别码的语音识别深度神经网络的快速适应

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Fast adaptation of deep neural networks (DNN) is an important research topic in deep learning. In this paper, we have proposed a general adaptation scheme for DNN based on discriminant condition codes, which are directly fed to various layers of a pre-trained DNN through a new set of connection weights. Moreover, we present several training methods to learn connection weights from training data as well as the corresponding adaptation methods to learn new condition code from adaptation data for each new test condition. In this work, the fast adaptation scheme is applied to supervised speaker adaptation in speech recognition based on either frame-level cross-entropy or sequence-level maximum mutual information training criterion. We have proposed three different ways to apply this adaptation scheme based on the so-called speaker codes: i) Nonlinear feature normalization in feature space; ii) Direct model adaptation of DNN based on speaker codes; iii) Joint speaker adaptive training with speaker codes. We have evaluated the proposed adaptation methods in two standard speech recognition tasks, namely TIMIT phone recognition and large vocabulary speech recognition in the Switchboard task. Experimental results have shown that all three methods are quite effective to adapt large DNN models using only a small amount of adaptation data. For example, the Switchboard results have shown that the proposed speaker-code-based adaptation methods may achieve up to 8-10% relative error reduction using only a few dozens of adaptation utterances per speaker. Finally, we have achieved very good performance in Switchboard (12.1% in WER) after speaker adaptation using sequence training criterion, which is very close to the best performance reported in this task (“Deep convolutional neural networks for LVCSR,” T. N. Sainath , Proc. IEEE Acoust., Speech, Signal Process., 2013).

机译：深度神经网络（DNN）的快速适应是深度学习中的重要研究主题。在本文中，我们提出了一种基于判别条件代码的DNN通用适应方案，该条件代码通过一组新的连接权重直接馈送到预训练DNN的各个层。此外，我们提出了几种训练方法以从训练数据中学习连接权重，以及相应的适应方法以从适应数据中为每个新测试条件学习新条件代码。在这项工作中，基于帧级交叉熵或序列级最大互信息训练准则，将快速自适应方案应用于语音识别中的监督说话者自适应。我们基于所谓的说话者代码提出了三种不同的方法来应用这种自适应方案：i）特征空间中的非线性特征归一化; ii）基于说话人代码的DNN直接模型自适应; iii）结合说话者代码进行说话人自适应训练。我们已经评估了两种标准语音识别任务中提出的自适应方法，即TIMIT电话识别和总机任务中的大词汇量语音识别。实验结果表明，这三种方法仅使用少量的适配数据就可以非常有效地适配大型DNN模型。例如，总机面板结果表明，所提出的基于扬声器代码的自适应方法可以使用每个扬声器仅几十个自适应话语来实现高达8-10％的相对误差减少。最后，在使用序列训练准则进行说话人自适应之后，我们在配电盘中取得了非常好的性能（在WER中为12.1％），这与该任务中报告的最佳性能非常接近（“用于LVCSR的深度卷积神经网络”，TN Sainath ，Proc。IEEE Acoust。，语音，信号处理，2013）。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2014年第12期|1713-1725|共13页
作者
Xue S.; Abdel-Hamid O.; Jiang H.; Dai L.; Liu Q.;
展开▼
作者单位

National Engineering Laboratory of Speech and Language Information Processing, University of Science and Technology of China, Hefei, P. R. China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Adaptation models; Artificial neural networks; Hidden Markov models; Speech recognition; Training; Vectors; Condition code; cross entropy (CE); deep neural network (DNN); fast adaptation; maximum mutual information (MMI); speaker code;

机译：适应模型;人工神经网络;隐藏的马尔可夫模型;语音识别;训练;向量;条件代码;交叉熵（CE）;深度神经网络（DNN）;快速适应;最大相互信息量（MMI）;演讲者代码;

相似文献

外文文献
中文文献
专利

1. Hierarchical Bayesian combination of plug-in maximum a posteriori decoders in deep neural networks-based speech recognition and speaker adaptation [J] . Huang Zhen, Siniscalchi Sabato Marco, Lee Chin-Hui Pattern recognition letters . 2017,第octa15期

机译：基于深度神经网络的语音识别和说话人自适应的插件最大后验解码器的分层贝叶斯组合
2. Discriminative Learning of Filterbank Layer within Deep Neural Network Based Speech Recognition for Speaker Adaptation [J] . Hiroshi SEKI, Kazumasa YAMAMOTO, Tomoyosi AKIBA, IEICE transactions on information and systems . 2019,第2期

机译：基于深度神经网络的说话人自适应语音识别的判别学习
3. An Unsuper vised Adaptation Method for Deep Neural Network-based Large Vocabulary Continuous Speech Recognition [J] . Yeming Xiao, Yujing Si, Ji Xu, Journal of information and computational science . 2014,第14期

机译：基于深度神经网络的大词汇量连续语音识别的无监督自适应方法
4. Unsupervised speaker adaptation of deep neural network based on the combination of speaker codes and singular value decomposition for speech recognition [C] . Xue Shaofei, Jiang Hui, Dai Lirong, IEEE International Conference on Acoustics, Speech and Signal Processing . 2015

机译：基于说话人代码和奇异值分解的组合的深度神经网络无监督说话人自适应语音识别
5. Dysarthric Speech Recognition and Offline Handwriting Recognition using Deep Neural Networks. [D] . Pillai, Suhas Balkrishna. 2017

机译：使用深度神经网络的表情异常语音识别和离线手写识别。
6. Multi-resolution speech analysis for automatic speech recognition using deep neural networks: Experiments on TIMIT [O] . Doroteo T. Toledano, María Pilar Fernández-Gallego, Alicia Lozano-Diez 2012

机译：基于深度神经网络的自动语音识别的多分辨率语音分析：TIMIT实验
7. Discriminative Learning of Filterbank Layer within Deep Neural Network Based Speech Recognition for Speaker Adaptation [O] . Hiroshi SEKI, Kazumasa YAMAMOTO, Tomoyosi AKIBA, 2019

机译：基于深神经网络中的滤波器层的判别差异学习扬声器适应的语音识别

Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅