Frame vs. Turn-Level: Emotion Recognition from Speech Considering Static and Dynamic Processing

机译：帧与转弯水平：考虑静态和动态处理的语音情感识别

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Opposing the pre-dominant turn-wise statistics of acoustic Low-Level-Descriptors followed by static classification we re-investigate dynamic modeling directly on the frame-level in speech-based emotion recognition. This seems beneficial, as it is well known that important information on temporal sub-turn-layers exists. And, most promisingly, we integrate this frame-level information within a state-of-the-art large-feature-space emotion recognition engine. In order to investigate frame-level processing we employ a typical speaker-recognition setup tailored for the use of emotion classification. That is a GMM for classification and MFCC plus speed and acceleration coefficients as features. We thereby also consider use of multiple states, respectively an HMM. In order to fuse this information with turn-based modeling, output scores are added to a super-vector combined with static acoustic features. Thereby a variety of Low-Level-Descriptors and functionals to cover prosodic, speech quality, and articulatory aspects are considered. Starting from 1.4k features we select optimal configurations including and excluding GMM information. The final decision task is realized by use of SVM. Extensive test-runs are carried out on two popular public databases, namely EMO-DB and SUSAS, to investigate acted and spontaneous data. As we face the current challenge of speaker-independent analysis we also discuss benefits arising from speaker normalization. The results obtained clearly emphasize the superior power of integrated diverse time-levels.

机译：反对声学低级描述符的主要反演统计，然后进行静态分类，我们直接在基于语音的情感识别中在帧级上重新研究了动态建模。这似乎是有益的，因为众所周知，存在有关时间子转弯层的重要信息。而且，最有希望的是，我们将这些帧级信息集成到了最新的大型功能空间情感识别引擎中。为了研究帧级处理，我们采用了针对情感分类而量身定制的典型说话人识别设置。这是用于分类和MFCC加上速度和加速度系数作为功能的GMM。因此，我们还考虑使用多个状态，分别是HMM。为了使此信息与基于回合的建模融合，将输出分数添加到结合了静态声学特征的超向量中。因此，考虑了涵盖字幕，语音质量和发音方面的各种低级描述符和功能。从1.4k功能开始，我们选择最佳配置，包括和排除GMM信息。最终决策任务是通过使用SVM来实现的。在两个流行的公共数据库EMO-DB和SUSAS上进行了广泛的测试，以调查实际和自发的数据。当我们面对说话人独立分析的当前挑战时，我们还将讨论说话人标准化带来的好处。所获得的结果清楚地强调了集成各种时间级别的优势。

著录项

来源
《International Conference on Affective Computing and Intelligent Interaction(ACII 2007); 20070912-14; Lisbon(PT)》|2007年|P.139-147|共9页
会议地点 Lisbon(PT)
作者
Bogdan Vlasenko; Bjoern Schuller; Andreas Wendemuth; Gerhard Rigoll;
展开▼
作者单位

Cognitive Systems, IESK, Otto-von-Guericke University, Magdeburg, Germany;

Institute for Human-Machine Communication, Technische Universitaet Muenchen, Germany;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类信息处理（信息加工）;
关键词
emotion recognition; frame-level analysis; turn-level analysis; model fusion; LOSO; feature selection;

机译：情绪识别;框架层次分析;转向层次分析;模型融合; LOSO;特征选择;

相似文献

外文文献
中文文献
专利

1. Static and Dynamic Variance Compensation for Recognition of Reverberant Speech With Dereverberation Preprocessing [J] . Delcroix M., Nakatani T., Watanabe S. IEEE transactions on audio, speech and language processing . 2009,第2期

机译：带有去混响预处理的混响语音静态和动态方差补偿
2. Deep Visual Attributes vs. Hand-Crafted Audio Features on Multidomain Speech Emotion Recognition [J] . Michalis Papakostas, Evaggelos Spyrou, Theodoros Giannakopoulos, Computation . 2017,第2期

机译：多域语音情感识别的深层视觉属性与手工制作的音频功能
3. Recognition of Human Emotions from Speech Processing [J] . V.V. Nanavare, S.K. Jagtap Procedia Computer Science . 2015,第1期

机译：从语音处理中识别人的情绪
4. Frame vs. Turn-Level: Emotion Recognition from Speech Considering Static and Dynamic Processing [C] . Bogdan Vlasenko, Bjoern Schuller, Andreas Wendemuth, International Conference on Affective Computing and Intelligent Interaction . 2007

机译：框架与转弯级别：从考虑静态和动态处理的演讲中的情感识别
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Dynamics Matter: Recognition of Reward Affiliative and Dominance Smiles From Dynamic vs. Static Displays [O] . Anna B. Orlowska, Eva G. Krumhuber, Magdalena Rychlowska, -1

机译：动态因素：从动态显示与静态显示中识别出奖励从属关系和优势微笑
7. Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing [O] . Bogdan Vlasenko, Björn Schuller, Andreas Wendemuth, 2007

机译：帧与转弯水平：考虑静态和动态处理的语音情感识别
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Frame vs. Turn-Level: Emotion Recognition from Speech Considering Static and Dynamic Processing

摘要

著录项

相似文献

相关主题

期刊订阅