...
首页> 外文期刊>Selected Topics in Signal Processing, IEEE Journal of >Improved Speech Presence Probabilities Using HMM-Based Inference, With Applications to Speech Enhancement and ASR
【24h】

Improved Speech Presence Probabilities Using HMM-Based Inference, With Applications to Speech Enhancement and ASR

机译:使用基于HMM的推理改进语音存在概率,并将其应用于语音增强和ASR

获取原文
           

摘要

This paper presents a technique for determining improved speech presence probabilities (SPPs), by exploiting the temporal correlation present in spectral speech data. Based on a set of traditional SPPs, we estimate the underlying speech presence probability via statistical inference. Traditional SPPs are assumed to be observations of channel-specific two-state Markov models. Corresponding steady-state and transitional statistics are set to capture the well-known temporal correlation of spectral speech data, and observation statistics are modeled based on the effect of additive acoustic noise on resulting SPPs. Once underlying models have been parameterized, improved speech presence probabilities can be estimated via traditional inference techniques, such as the forward or forward-backward algorithms. The 2-state configuration of underlying signal models enables low complexity HMM-based processing, only slightly increasing complexity relative to standard SPPs, and thereby making the proposed framework attractive for resource-constrained scenarios. Proposed SPP masks are shown to provide a significant increase in accuracy relative to the state-of-the-art method of [12], in terms of the mean pointwise Kullback-Leibler (KL) distance. When applied to soft-decision speech enhancement, proposed SPPs show improved results in terms of segmental SNRs. Closer analysis reveals significantly decreased noise leakage, whereas speech distortion is increased. When applied to automatic speech recognition (ASR), the use of soft-decision enhancement with proposed SPPs provides increased recognition performance, relative to [12].
机译:本文提出了一种通过利用频谱语音数据中存在的时间相关性来确定改进的语音存在概率(SPP)的技术。基于一组传统的SPP,我们通过统计推断来估计潜在的语音存在概率。假定传统的SPP是对通道特定的两态马尔可夫模型的观察。设置了相应的稳态和过渡统计量以捕获频谱语音数据的众所周知的时间相关性,并根据附加声噪声对所得SPP的影响对观察统计量进行建模。一旦对基础模型进行了参数化,就可以通过传统的推理技术(例如前向或前向后退算法)来估计改善的语音存在概率。基础信号模型的2状态配置可实现低复杂度的基于HMM的处理,相对于标准SPP而言,仅稍微增加了复杂度,从而使所提出的框架对资源受限的情况有吸引力。相对于[12]的最新方法,建议的SPP掩模在平均点向Kullback-Leibler(KL)距离方面提供了显着的精度提高。当应用于软判决语音增强时,建议的SPP在分段SNR方面显示出改进的结果。仔细分析发现,噪声泄漏明显减少,而语音失真则增加了。当应用于自动语音识别(ASR)时,相对于[12],将软判决增强与建议的SPP配合使用可提高识别性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号