首页> 外文期刊>IEICE Transactions on Information and Systems >Orthogonalized Distinctive Phonetic Feature Extraction for Noise-Robust Automatic Speech Recognition
【24h】

Orthogonalized Distinctive Phonetic Feature Extraction for Noise-Robust Automatic Speech Recognition

机译:正交特征语音特征提取在强噪声自动语音识别中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we propose a noise-robust automatic speech recognition system that uses orthogonalized distinctive phonetic features (DPFs) as input of HMM with diagonal covariance. In an orthogonalized DPF extraction stage, first, a speech signal is converted to acoustic features composed of local features (LFs) and ΔP, then a multilayer neural network (MLN) with 15x3 output units composed of context-dependent DPFs of a preceding context DPF vector, a current DPF vector, and a following context DPF vector maps the LFs to DPFs. Karhunen-Loeve transform (KLT) is then applied to orthogonalize each DPF vector in the context-dependent DPFs, using orthogonal bases calculated from a DPF vector that represents 38 Japanese phonemes. Each orthogonalized DPF vector is finally decor-related one another by using Gram-Schmidt orthogonalization procedure. In experiments, after evaluating the parameters of the MLN input and output units in the DPF extractor, the orthogonalized DPFs are compared with original DPFs. The orthogonalized DPFs are then evaluated in comparison with a standard parameter set of MFCCs and dynamic features. Next, noise robustness is tested using four types of additive noise. The experimental results show that the use of the proposed orthogonalized DPFs can significantly reduce the error rate in an isolated spoken-word recognition task both with clean speech and with speech contaminated by additive noise. Furthermore, we achieved significant improvements when combining the orthogonalized DPFs with conventional static MFCCs and ΔP.
机译:在本文中,我们提出了一种抗噪自动语音识别系统,该系统使用正交化的独特语音特征(DPF)作为具有对角协方差的HMM的输入。在正交DPF提取阶段,首先,将语音信号转换为由局部特征(LF)和ΔP组成的声学特征,然后将多层神经网络(MLN)转换为15x3输出单元,该输出单元由先前上下文DPF的上下文相关DPF组成向量,当前DPF向量和后续上下文DPF向量将LF映射到DPF。然后,使用Karhunen-Loeve变换(KLT),使用从表示38个日语音素的DPF矢量计算出的正交基,将上下文依赖的DPF中的每个DPF矢量正交化。最后,通过使用Gram-Schmidt正交化过程,将每个正交化的DPF向量彼此进行装饰相关。在实验中,在评估DPF提取器中MLN输入和输出单元的参数后,将正交化的DPF与原始DPF进行比较。然后,将正交化的DPF与MFCC和动态特征的标准参数集进行比较。接下来,使用四种类型的加性噪声​​测试噪声鲁棒性。实验结果表明,所提出的正交DPF可以显着降低带有清晰语音和加性噪声污染的语音的孤立口语识别任务中的错误率。此外,当将正交化的DPF与常规的静态MFCC和ΔP结合使用时,我们获得了重大改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号