Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

Xiong Xiao; Shengkui Zhao; Duc Hoang Ha Nguyen; Xionghu Zhong; Douglas L. Jones; Eng Siong Chng; Haizhou Li

首页> 外文期刊>EURASIP journal on advances in signal processing >Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

【24h】

Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

机译：使用动态特征增强和识别的语音去混响约束深度神经网络和特征自适应

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper investigates deep neural networks (DNN) based on nonlinear feature mapping and statistical linear feature adaptation approaches for reducing reverberation in speech signals. In the nonlinear feature mapping approach, DNN is trained from parallel clean/distorted speech corpus to map reverberant and noisy speech coefficients (such as log magnitude spectrum) to the underlying clean speech coefficients. The constraint imposed by dynamic features (i.e., the time derivatives of the speech coefficients) are used to enhance the smoothness of predicted coefficient trajectories in two ways. One is to obtain the enhanced speech coefficients with a least square estimation from the coefficients and dynamic features predicted by DNN. The other is to incorporate the constraint of dynamic features directly into the DNN training process using a sequential cost function. In the linear feature adaptation approach, a sparse linear transform, called cross transform, is used to transform multiple frames of speech coefficients to a new feature space. The transform is estimated to maximize the likelihood of the transformed coefficients given a model of clean speech coefficients. Unlike the DNN approach, no parallel corpus is used and no assumption on distortion types is made. The two approaches are evaluated on the REVERB Challenge 2014 tasks. Both speech enhancement and automatic speech recognition (ASR) results show that the DNN-based mappings significantly reduce the reverberation in speech and improve both speech quality and ASR performance. For the speech enhancement task, the proposed dynamic feature constraint help to improve cepstral distance, frequency-weighted segmental signal-to-noise ratio (SNR), and log likelihood ratio metrics while moderately degrades the speech-to-reverberation modulation energy ratio. In addition, the cross transform feature adaptation improves the ASR performance significantly for clean-condition trained acoustic models.

机译：本文研究了基于非线性特征映射和统计线性特征自适应方法的深度神经网络（DNN），以减少语音信号的混响。在非线性特征映射方法中，从并行的干净/失真语音语料库训练DNN，以将混响和嘈杂的语音系数（例如对数幅度谱）映射到底层的干净语音系数。动态特征（即语音系数的时间导数）施加的约束以两种方式用于增强预测系数轨迹的平滑度。一种方法是根据DNN预测的系数和动态特征，以最小二乘估计获得增强的语音系数。另一种是使用顺序成本函数将动态特征的约束直接合并到DNN训练过程中。在线性特征自适应方法中，称为交叉变换的稀疏线性变换用于将语音系数的多个帧变换到新的特征空间。给定干净语音系数的模型，估计变换以最大化变换系数的可能性。与DNN方法不同，没有使用并行语料库，也没有对失真类型进行假设。在REVERB Challenge 2014任务中评估了这两种方法。语音增强和自动语音识别（ASR）结果均表明，基于DNN的映射显着减少了语音的混响，并提高了语音质量和ASR性能。对于语音增强任务，提出的动态特征约束有助于改善倒谱距离，频率加权分段信噪比（SNR）和对数似然比指标，同时适度降低语音与混响调制能量比。此外，交叉变换特征自适应可显着改善在干净条件下训练的声学模型的ASR性能。

著录项

来源
《EURASIP journal on advances in signal processing》 |2016年第1期|共页
作者
Xiong Xiao; Shengkui Zhao; Duc Hoang Ha Nguyen; Xionghu Zhong; Douglas L. Jones; Eng Siong Chng; Haizhou Li;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类通信;
关键词
BeamformingDeep neural networksDynamic featuresFeature adaptationRobust speech recognitionReverberation challengeSpeech enhancement;

机译：波束成形深层神经网络动态功能特征自适应鲁棒语音识别混响挑战语音增强;

相似文献

外文文献
中文文献
专利

1. Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature [J] . Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara EURASIP journal on advances in signal processing . 2015,第1期

机译：结合了深度神经网络和深度自动编码器的混响语音识别，并增强了电话类功能
2. Deep and shallow features fusion based on deep convolutional neural network for speech emotion recognition [J] . Linhui Sun, Jia Chen, Keli Xie, International journal of speech technology . 2018,第4期

机译：基于深度卷积神经网络的深浅特征融合在语音情感识别中的应用
3. Dynamic feature variance adaptation for robust speech recognition with a speech enhancement pre-processor [J] . Marc DELCROIX, Tomohiro NAKATANI, Shinji WATANABE 電子情報通信学会技術研究報告. 音声. Speech . 2007,第406期

机译：动态特征方差自适应，可通过语音增强预处理器实现健壮的语音识别
4. Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition [C] . Feng Xue, Zhang Yaodong, Glass James IEEE International Conference on Acoustics, Speech and Signal Processing . 2014

机译：通过深度自动编码器对语音特征进行去噪和去混响，以实现嘈杂的混响语音识别
5. A Framework for Enhancing Speaker Age and Gender Classification by Using a New Feature Set and Deep Neural Network Architectures [D] . Abumallouh, Arafat. 2017

机译：通过使用新功能集和深度神经网络体系结构提高演讲者年龄和性别分类的框架
6. Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network [O] . Misbah Farooq, Fawad Hussain, Naveed Khan Baloch, 2020

机译：利用深卷积神经网络对语音情感识别的特征选择算法的影响
7. Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation [O] . 2016

机译：使用动态特征增强和识别的语音去混响约束深度神经网络和特征自适应

Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation

摘要

著录项

相似文献

相关主题

期刊订阅