首页> 外文期刊>Circuits, systems and signal processing >DNN-HMM-Based Speaker-Adaptive Emotion Recognition Using MFCC and Epoch-Based Features
【24h】

DNN-HMM-Based Speaker-Adaptive Emotion Recognition Using MFCC and Epoch-Based Features

机译:基于DNN-HMM的扬声器 - 适应性的情感识别,使用MFCC和基于划时的特征

获取原文
获取原文并翻译 | 示例
           

摘要

Speech emotion recognition (SER) systems are often evaluated in a speaker-independent manner. However, the variation in the acoustic features of different speakers used during training and evaluation results in a significant drop in the accuracy during evaluation. While speaker-adaptive techniques have been used for speech recognition, to the best of our knowledge, they have not been employed for emotion recognition. Motivated by this, a speaker-adaptive DNN-HMM-based SER system is proposed in this paper. Feature space maximum likelihood linear regression technique has been used for speaker adaptation during both training and testing phases. The proposed system uses MFCC and epoch-based features. We have exploited our earlier work on robust detection of epochs from emotional speech to obtain emotion-specific epoch-based features, namely instantaneous pitch, phase, and the strength of excitation. The combined feature set improves on the MFCC features, which have been the baseline for SER systems in the literature by + 5.07% and over the state-of-the-art techniques by + 7.13 %. While using just the MFCC features, the proposed model improves upon the state-of-the-art techniques by 2.06%. These results bring out the importance of speaker adaptation for SER systems and highlight the complementary nature of the MFCC and epoch-based features for emotion recognition using speech. All experiments were carried out an IEMOCAP emotional dataset.
机译:语音情感识别(SER)系统通常以扬声器的方式进行评估。然而,在训练和评估期间使用的不同扬声器的声学特征的变化导致评估期间的准确性显着下降。虽然扬声器 - 自适应技术已被用于语音识别,但据我们所知,他们没有用于情感认可。由此激励,本文提出了一种基于扬声器 - 自适应DNN-HMM的SER系统。特征空间最大似然线性回归技术已用于训练和测试阶段的扬声器适应。所提出的系统使用MFCC和基于划时的特征。我们已经利用了我们早期的工作,以便从情绪言论中恢复巨大的时期,以获得情绪特定的跨纪元的特征,即瞬时间距,相位和激发强度。该组合特征集可提高MFCC功能,该特征是文献中SER系统的基线+ 5.07%,最先进的技术+ 7.13%。在使用MFCC功能的同时,所提出的模型可以提高最先进的技术2.06%。这些结果为SER系统带来了扬声器适应的重要性,并突出了使用语音的情感识别的MFCC和基于划时的互补性质。所有实验均进行Iemocap情绪数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号