An improved gaussian mixture hidden conditional random fields model for audio-based emotions classification

Muhammad Hameed Siddiqi

摘要

The analysis of human emotions plays a significant role in providing sufficient information about patients in monitoring their feelings for better management of their diseases. Audio-based emotions recognition has become a fascinating research interest for such domains during the last decade. Mostly, audio-based emotions systems depend on the recognition stage. The existing model has a common issue called objectivity suppositions problem, which might decrease the recognition rate. Therefore, this study investigates the improved version of a classifier that is based on hidden conditional random fields (HCRFs) model to classify emotional speech. In this model, we introduced a novel methodology that will incorporate multifaceted dissemination with the help of employing a combination of complete covariance Gaussian concreteness function. Due to this incorporation, the proposed model tackle most of the limitations of existing classifiers. Some of the well-known features like Mel-frequency cepstral coefficients (MFCC) are extracted in our experiments. The proposed model has been validated and evaluated on two publicly available datasets likes Berlin Database of Emotional Speech (Emo-DB) and the eNTER FACE’05 Audio-Visual Emotion dataset. For validation and comparison against the existing techniques, we utilized10-fold cross validation scheme. The proposed method achieved significant improvement under the p-value <0.03 for classification. Moreover, we also prove that computational wise, our computation technique is less expensive against state of the art works.

机译：人类情绪的分析在提供有关患者的足够信息时发挥着重要作用，以监测他们更好地管理其疾病的感受。基于音频的情感认可已成为过去十年中这种域的令人着迷的研究兴趣。主要是，基于音频的情绪系统取决于识别阶段。现有模型具有常见的问题，称为客观假设问题，这可能会降低识别率。因此，本研究调查了基于隐藏条件随机字段（HCRF）模型的分类器的改进版本，以对情绪语音进行分类。在该模型中，我们介绍了一种新的方法，这些方法将在采用完全协方差高斯具体功能的组合的帮助下融入多方面传播。由于该公司，所提出的模型解决了现有分类器的大部分限制。在我们的实验中提取熔融频率谱系数（MFCC）等一些众所周知的特征。拟议的模型已被验证和评估，并在两个公共可用数据集上获取柏林数据库情绪语音（emo-db）和进入面部的音频视听情感数据集。为了验证和比较现有技术，我们使用了10倍的交叉验证方案。该方法在P值<0.03下实现了显着的改进，以进行分类。此外，我们还证明了计算明智，我们的计算技术对艺术状态的昂贵昂贵。

An improved gaussian mixture hidden conditional random fields model for audio-based emotions classification

摘要

著录项

相关主题

期刊订阅