首页> 外文会议>IEEE Workshop on Automatic Speech Recognition and Understanding >Discriminative piecewise linear transformation based on deep learning for noise robust automatic speech recognition
【24h】

Discriminative piecewise linear transformation based on deep learning for noise robust automatic speech recognition

机译:基于深度学习的判别分段线性变换用于噪声鲁棒自动语音识别

获取原文

摘要

In this paper, we propose the use of deep neural networks to expand conventional methods of statistical feature enhancement based on piecewise linear transformation. Stereo-based piecewise linear compensation for environments (SPLICE), which is a powerful statistical approach for feature enhancement, models the probabilistic distribution of input noisy features as a mixture of Gaussians. However, soft assignment of an input vector to divided regions is sometimes done inadequately and the vector comes to go through inadequate conversion. Especially when conversion has to be linear, the conversion performance will be easily degraded. Feature enhancement using neural networks is another powerful approach which can directly model a non-linear relationship between noisy and clean feature spaces. In this case, however, it tends to suffer from over-fitting problems. In this paper, we attempt to mitigate this problem by reducing the number of model parameters to estimate. Our neural network is trained whose output layer is associated with the states in the clean feature space, not in the noisy feature space. This strategy makes the size of the output layer independent of the kind of a given noisy environment. Firstly, we characterize the distribution of clean features as a Gaussian mixture model and then, by using deep neural networks, estimate discriminatively the state in the clean space that an input noisy feature corresponds to. Experimental evaluations using the Aurora 2 dataset demonstrate that our proposed method has the best performance compared to conventional methods.
机译:在本文中,我们建议使用深度神经网络来扩展基于分段线性变换的常规统计特征增强方法。基于立体声的环境分段线性补偿(SPLICE)是一种功能强大的统计增强功能,可以将输入噪声特征的概率分布建模为高斯混合。但是,有时无法将输入向量软分配到划分的区域,并且该向量将经历不充分的转换。特别是当转换必须是线性的时,转换性能将容易降低。使用神经网络进行特征增强是另一种强大的方法,可以直接对嘈杂的特征空间与干净特征空间之间的非线性关系进行建模。然而,在这种情况下,它倾向于遭受过度装配的问题。在本文中,我们尝试通过减少要估计的模型参数的数量来缓解此问题。我们的神经网络经过训练,其输出层与干净特征空间(而不是嘈杂特征空间)中的状态相关。这种策略使输出层的大小独立于给定噪声环境的种类。首先,我们将清洁特征的分布表征为高斯混合模型,然后通过使用深度神经网络,有区别地估计输入噪声特征对应的清洁空间中的状态。使用Aurora 2数据集进行的实验评估表明,与传统方法相比,我们提出的方法具有最佳性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号