Deep spatio-temporal feature fusion with compact bilinear pooling for multimodal emotion recognition

Dung Nguyen; Kien Nguyen; Sridha Sridharan; David Dean; Clinton Fookes

首页> 外文期刊>Computer vision and image understanding >Deep spatio-temporal feature fusion with compact bilinear pooling for multimodal emotion recognition

【24h】

Deep spatio-temporal feature fusion with compact bilinear pooling for multimodal emotion recognition

机译：深度时空特征融合与紧凑型双线性池进行多模式情感识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multimodal emotion recognition has attracted great interest recently and numerous methodologies have been successfully investigated. However, the task requires the effective fusion multimodal representations in audio and video domains, and existing approaches still perform poorly on such a challenging task. This paper proposes a novel framework for recognizing emotion from multiple sources including facial expression, pose, body movements, and voice. In this framework, we first introduce new deep spatio-temporal features by cascading 3-dimensional convolution neural networks (C3Ds) and deep belief networks (DBNs) to effectively model spatial and temporal information presented in video and audio for emotion recognition. We subsequently propose a new feature-level fusion approach based on a bilinear pooling theory to combine the visual and audio feature vectors. The proposed fusion strategy allows all elements of the component vectors to interact with each other in an effective way, resulting in expressively capturing the complex and intrinsic associations between the component modalities. Extensive experiments conducted on the eNTERFACE and FABO multimodal emotion databases demonstrate that our proposed system leads to improved multimodal emotion recognition performance and significantly outperforms recent state-of-the-art approaches.

机译：最近，多模式情绪识别引起了人们的极大兴趣，并且成功地研究了许多方法。但是，该任务需要在音频和视频领域中进行有效的融合多模态表示，并且现有方法仍然无法在如此具有挑战性的任务上执行。本文提出了一种新颖的框架，用于从多种来源识别情绪，包括面部表情，姿势，身体动作和声音。在此框架中，我们首先通过级联3维卷积神经网络（C3D）和深度置信网络（DBN）来引入新的深时空特征，以有效地建模视频和音频中呈现的时空信息以进行情感识别。随后，我们基于双线性池化理论提出了一种新的特征级融合方法，以结合视觉和音频特征向量。所提出的融合策略允许分量向量的所有元素以有效方式彼此交互，从而表现性地捕获了分量模态之间的复杂关联和内在关联。在eNTERFACE和FABO多模态情感数据库上进行的大量实验表明，我们提出的系统可改善多模态情感识别性能，并且明显优于最新技术。

著录项

来源
《Computer vision and image understanding》 |2018年第9期|33-42|共10页
作者
Dung Nguyen; Kien Nguyen; Sridha Sridharan; David Dean; Clinton Fookes;
展开▼
作者单位

Speech Audio Image and Video Technology (SAIVT) Laboratory - Queensland University of Technology;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Deep spatio-temporal feature fusion with compact bilinear pooling for multimodal emotion recognition [J] . Dung Nguyen, Kien Nguyen, Sridha Sridharan, Computer vision and image understanding . 2018,第Sepa期

机译：深度时空特征融合与紧凑型双线性池进行多模式情感识别
2. Classification of engraved pottery sherds mixing deep-learning features by compact bilinear pooling [J] . Chetouani Aladine, Treuillet Sylvie, Exbrayat Matthieu, Pattern recognition letters . 2020,第Mara期

机译：Compact Bilinear Clobing雕刻陶器陶器的分类混合深度学习功能
3. Implementation of multimodal biometric recognition via multi-feature deep learning networks and feature fusion [J] . Tiong Leslie Ching Ow, Kim Seong Tae, Ro Yong Man Multimedia Tools and Applications . 2019,第16期

机译：通过多特征深度学习网络和特征融合实现多模式生物识别
4. Deep Fusion: An Attention Guided Factorized Bilinear Pooling for Audio-video Emotion Recognition [C] . Yuanyuan Zhang, Zi-Rui Wang, Jun Du International Joint Conference on Neural Networks . 2019

机译：深度融合：注意导向的因式分解双线性池用于视听情绪识别
5. Feature quality fusion based multimodal eye recognition. [D] . Zhou, Zhi. 2013

机译：基于特征质量融合的多模式眼睛识别。
6. FusionSense: Emotion Classification Using Feature Fusion of Multimodal Data and Deep Learning in a Brain-Inspired Spiking Neural Network [O] . Clarence Tan, Gerardo Ceballos, Nikola Kasabov, 2020

机译：Fusionsense：情感分类使用多模式数据的特征融合和脑激发尖刺神经网络中的深度学习
7. Deep spatio-temporal features for multimodal emotion recognition [O] . Nguyen Tien Dung, Nguyen Thanh Kien, Sridharan Sridha, 2017

机译：深度时空特征用于多模式情感识别

Deep spatio-temporal feature fusion with compact bilinear pooling for multimodal emotion recognition

摘要

著录项

相似文献

相关主题

期刊订阅