首页> 外文期刊>EURASIP journal on applied signal processing >Robust Distant Speech Recognition by Combining Multiple Microphone-Array Processing with Position-Dependent CMN
【24h】

Robust Distant Speech Recognition by Combining Multiple Microphone-Array Processing with Position-Dependent CMN

机译:通过将多个麦克风阵列处理与位置相关的CMN相结合,实现鲁棒的远程语音识别

获取原文
获取原文并翻译 | 示例
           

摘要

We propose robust distant speech recognition by combining multiple microphone-array processing with position-dependent cepstral mean normalization (CMN). In the recognition stage, the system estimates the speaker position and adopts compensation parameters estimated a priori corresponding to the estimated position. Then the system applies CMN to the speech (i.e., position-dependent CMN) and performs speech recognition for each channel. The features obtained from the multiple channels are integrated with the following two types of processings. The first method is to use the maximum vote or the maximum summation likelihood of recognition results from multiple channels to obtain the final result, which is called multiple-decoder processing. The second method is to calculate the output probability of each input at frame level, and a single decoder using these output probabilities is used to perform speech recognition. This is called single-decoder processing, resulting in lower computational cost. We combine the delay-and-sum beamforming with multiple-decoder processing or single-decoder processing, which is termed multiple microphone-array processing. We conducted the experiments of our proposed method using a limited vocabulary (100 words) distant isolated word recognition in a real environment. The proposed multiple microphone-array processing using multiple decoders with position-dependent CMN achieved a 3.2% improvement (50% relative error reduction rate) over the delay-and-sum beamforming with conventional CMN (i.e., the conventional method). The multiple microphone-array processing using a single decoder needs about one-third the computational time of that using multiple decoders without degrading speech recognition performance.
机译:我们通过结合多个麦克风阵列处理与位置相关的倒谱均值归一化(CMN)来提出鲁棒的远距离语音识别。在识别阶段,系统估计说话者的位置,并采用与估计位置相对应的先验估计的补偿参数。然后,系统将CMN应用于语音(即位置相关的CMN),并为每个通道执行语音识别。从多个通道获得的功能与以下两种类型的处理集成在一起。第一种方法是使用来自多个通道的识别结果的最大投票或最大求和可能性来获得最终结果,这称为多解码器处理。第二种方法是在帧级别计算每个输入的输出概率,并且使用使用这些输出概率的单个解码器执行语音识别。这称为单解码器处理,从而降低了计算成本。我们将延迟和求和波束成形与多解码器处理或单解码器处理(称为多麦克风阵列处理)结合在一起。我们在实际环境中使用有限词汇(100个单词)远距离隔离单词识别进行了我们提出的方法的实验。与具有常规CMN(即常规方法)的延迟和求和波束形成相比,所提出的使用具有位置相关CMN的多个解码器的多重麦克风阵列处理实现了3.2%的改进(50%的相对误差减小率)。使用单个解码器的多个麦克风阵列处理需要大约使用多个解码器的计算时间的三分之一,而不会降低语音识别性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号