...
首页> 外文期刊>EURASIP journal on advances in signal processing >A reverberation-time-aware DNN approach leveraging spatial information for microphone array dereverberation
【24h】

A reverberation-time-aware DNN approach leveraging spatial information for microphone array dereverberation

机译:一种利用空间信息进行麦克风阵列去混响的可感知混响时间的DNN方法

获取原文
           

摘要

A reverberation-time-aware deep-neural-network (DNN)-based multi-channel speech dereverberation framework is proposed to handle a wide range of reverberation times (RT60s). There are three key steps in designing a robust system. First, to accomplish simultaneous speech dereverberation and beamforming, we propose a framework, namely DNNSpatial, by selectively concatenating log-power spectral (LPS) input features of reverberant speech from multiple microphones in an array and map them into the expected output LPS features of anechoic reference speech based on a single deep neural network (DNN). Next, the temporal auto-correlation function of received signals at different RT60s is investigated to show that RT60-dependent temporal-spatial contexts in feature selection are needed in the DNNSpatial training stage in order to optimize the system performance in diverse reverberant environments. Finally, the RT60 is estimated to select the proper temporal and spatial contexts before feeding the log-power spectrum features to the trained DNNs for speech dereverberation. The experimental evidence gathered in this study indicates that the proposed framework outperforms the state-of-the-art signal processing dereverberation algorithm weighted prediction error (WPE) and conventional DNNSpatial systems without taking the reverberation time into account, even for extremely weak and severe reverberant conditions. The proposed technique generalizes well to unseen room size, array geometry and loudspeaker position, and is robust to reverberation time estimation error.
机译:提出了一种基于混响时间感知的深度神经网络(DNN)多通道语音混响框架,以处理多种混响时间(RT60s)。设计健壮的系统需要三个关键步骤。首先,为了完成同时的语音去混响和波束成形,我们通过选择性地串联来自阵列中多个麦克风的混响语音的对数功率谱(LPS)输入特征,并将它们映射到无回声的预期输出LPS特征中,提出了一个框架,即DNNSpatial。基于单个深度神经网络(DNN)的参考语音。接下来,研究了不同RT60处接收信号的时间自相关函数,以显示在DNNSpatial训练阶段中,在特征选择中需要依赖RT60的时空上下文,以便在各种混响环境中优化系统性能。最后,在将对数功率谱特征馈送到经过训练的DNN进行语音去混响之前,估计RT60选择适当的时间和空间上下文。这项研究中收集到的实验证据表明,即使在极弱和严重混响的情况下,所提出的框架在不考虑混响时间的情况下也优于最新的信号处理混响算法加权预测误差(WPE)和常规DNNSpatial系统。条件。所提出的技术很好地概括了看不见的房间大小,阵列几何形状和扬声器位置,并且对混响时间估计误差具有鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号