...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition
【24h】

Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition

机译:基于判别码的语音识别深度神经网络的快速适应

获取原文
获取原文并翻译 | 示例
           

摘要

Fast adaptation of deep neural networks (DNN) is an important research topic in deep learning. In this paper, we have proposed a general adaptation scheme for DNN based on discriminant condition codes, which are directly fed to various layers of a pre-trained DNN through a new set of connection weights. Moreover, we present several training methods to learn connection weights from training data as well as the corresponding adaptation methods to learn new condition code from adaptation data for each new test condition. In this work, the fast adaptation scheme is applied to supervised speaker adaptation in speech recognition based on either frame-level cross-entropy or sequence-level maximum mutual information training criterion. We have proposed three different ways to apply this adaptation scheme based on the so-called speaker codes: i) Nonlinear feature normalization in feature space; ii) Direct model adaptation of DNN based on speaker codes; iii) Joint speaker adaptive training with speaker codes. We have evaluated the proposed adaptation methods in two standard speech recognition tasks, namely TIMIT phone recognition and large vocabulary speech recognition in the Switchboard task. Experimental results have shown that all three methods are quite effective to adapt large DNN models using only a small amount of adaptation data. For example, the Switchboard results have shown that the proposed speaker-code-based adaptation methods may achieve up to 8-10% relative error reduction using only a few dozens of adaptation utterances per speaker. Finally, we have achieved very good performance in Switchboard (12.1% in WER) after speaker adaptation using sequence training criterion, which is very close to the best performance reported in this task (“Deep convolutional neural networks for LVCSR,” T. N. Sainath , Proc. IEEE Acoust., Speech, Signal Process., 2013).
机译:深度神经网络(DNN)的快速适应是深度学习中的重要研究主题。在本文中,我们提出了一种基于判别条件代码的DNN通用适应方案,该条件代码通过一组新的连接权重直接馈送到预训练DNN的各个层。此外,我们提出了几种训练方法以从训练数据中学习连接权重,以及相应的适应方法以从适应数据中为每个新测试条件学习新条件代码。在这项工作中,基于帧级交叉熵或序列级最大互信息训练准则,将快速自适应方案应用于语音识别中的监督说话者自适应。我们基于所谓的说话者代码提出了三种不同的方法来应用这种自适应方案:i)特征空间中的非线性特征归一化; ii)基于说话人代码的DNN直接模型自适应; iii)结合说话者代码进行说话人自适应训练。我们已经评估了两种标准语音识别任务中提出的自适应方法,即TIMIT电话识别和总机任务中的大词汇量语音识别。实验结果表明,这三种方法仅使用少量的适配数据就可以非常有效地适配大型DNN模型。例如,总机面板结果表明,所提出的基于扬声器代码的自适应方法可以使用每个扬声器仅几十个自适应话语来实现高达8-10%的相对误差减少。最后,在使用序列训练准则进行说话人自适应之后,我们在配电盘中取得了非常好的性能(在WER中为12.1%),这与该任务中报告的最佳性能非常接近(“用于LVCSR的深度卷积神经网络”,TN Sainath ,Proc。IEEE Acoust。,语音,信号处理,2013)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号