Discriminative piecewise linear transformation based on deep learning for noise robust automatic speech recognition

机译：基于深度学习的判别分段线性变换用于噪声鲁棒自动语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose the use of deep neural networks to expand conventional methods of statistical feature enhancement based on piecewise linear transformation. Stereo-based piecewise linear compensation for environments (SPLICE), which is a powerful statistical approach for feature enhancement, models the probabilistic distribution of input noisy features as a mixture of Gaussians. However, soft assignment of an input vector to divided regions is sometimes done inadequately and the vector comes to go through inadequate conversion. Especially when conversion has to be linear, the conversion performance will be easily degraded. Feature enhancement using neural networks is another powerful approach which can directly model a non-linear relationship between noisy and clean feature spaces. In this case, however, it tends to suffer from over-fitting problems. In this paper, we attempt to mitigate this problem by reducing the number of model parameters to estimate. Our neural network is trained whose output layer is associated with the states in the clean feature space, not in the noisy feature space. This strategy makes the size of the output layer independent of the kind of a given noisy environment. Firstly, we characterize the distribution of clean features as a Gaussian mixture model and then, by using deep neural networks, estimate discriminatively the state in the clean space that an input noisy feature corresponds to. Experimental evaluations using the Aurora 2 dataset demonstrate that our proposed method has the best performance compared to conventional methods.

机译：在本文中，我们建议使用深度神经网络来扩展基于分段线性变换的常规统计特征增强方法。基于立体声的环境分段线性补偿（SPLICE）是一种功能强大的统计增强功能，可以将输入噪声特征的概率分布建模为高斯混合。但是，有时无法将输入向量软分配到划分的区域，并且该向量将经历不充分的转换。特别是当转换必须是线性的时，转换性能将容易降低。使用神经网络进行特征增强是另一种强大的方法，可以直接对嘈杂的特征空间与干净特征空间之间的非线性关系进行建模。然而，在这种情况下，它倾向于遭受过度装配的问题。在本文中，我们尝试通过减少要估计的模型参数的数量来缓解此问题。我们的神经网络经过训练，其输出层与干净特征空间（而不是嘈杂特征空间）中的状态相关。这种策略使输出层的大小独立于给定噪声环境的种类。首先，我们将清洁特征的分布表征为高斯混合模型，然后通过使用深度神经网络，有区别地估计输入噪声特征对应的清洁空间中的状态。使用Aurora 2数据集进行的实验评估表明，与传统方法相比，我们提出的方法具有最佳性能。

著录项

来源
《IEEE Workshop on Automatic Speech Recognition and Understanding》|2013年|350-355|共6页
会议地点
作者
Kashiwagi Yosuke; Saito Daisuke; Minematsu Nobuaki; Hirose Keikichi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Automatic speech recognition; Deep learning; Noise robustness; feature enhancement;

机译：自动语音识别;深度学习;噪声鲁棒性;功能增强;

相似文献

外文文献
中文文献
专利

1. Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition [J] . Shimada Kazuki, Bando Yoshiaki, Mimura Masato, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2019,第5期

机译：基于多通道NMF信息波束形成的无监督语音增强技术，用于强噪声自动语音识别
2. Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition [J] . Shimada Kazuki, Bando Yoshiaki, Mimura Masato, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2019,第5期

机译：基于多通道NMF的噪声强度自动语音识别的无监督语音增强
3. Noise robust speech recognition system using multimodal audio-visual approach using different deep learning classification techniques [J] . Eslam E. El Maghraby, Amr M. Gody, Mohamed Hesham Farouk International Journal of Advanced Computer Research . 2020,第47期

机译：利用不同深度学习分类技术，使用多模式视听方法的噪声强大语音识别系统
4. Discriminative piecewise linear transformation based on deep learning for noise robust automatic speech recognition [C] . Kashiwagi Yosuke, Saito Daisuke, Minematsu Nobuaki, IEEE Workshop on Automatic Speech Recognition and Understanding . 2013

机译：基于深度学习噪声强大自动语音识别的辨别分段线性变换
5. Compressive nonlinearity for representing speech spectral magnitude to improve noise robustness of automatic speech recognition . [D] . Wong, Brian. 2011

机译：压缩非线性表示语音频谱幅度提高语音自动识别的鲁棒性。
6. Threshold-Based Noise Detection and Reduction for Automatic Speech Recognition System in Human-Robot Interactions [O] . Sheng-Chieh Lee, Jhing-Fa Wang, Miao-Hia Chen 2018

机译：人机交互中基于阈值的自动语音识别系统噪声检测与消减
7. DISCRIMINATIVE PIECEWISE LINEAR TRANSFORMATION BASED ON DEEP LEARNING FOR NOISE ROBUST AUTOMATIC SPEECH RECOGNITION [O] . Yosuke Kashiwagi, Daisuke Saito, Nobuaki Minematsu, 2015

机译：基于深度学习的声学分段线性变换在噪声鲁棒自动语音识别中的应用

Discriminative piecewise linear transformation based on deep learning for noise robust automatic speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅