Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities

Taras Butko; Cristian Canton-Ferrer; Carlos Segura; Xavier Gir#243; Climent Nadeu; Javier Hernando; Josep R. Casas

首页> 外文期刊>EURASIP journal on advances in signal processing >Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities

【24h】

Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities

机译：基于音视频形态特征水平融合的声事件检测

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in audio signals. When applied to spontaneously generated acoustic events, AED based only on audio information shows a large amount of errors, which are mostly due to temporal overlaps. Actually, temporal overlaps accounted for more than 70% of errors in the real-world interactive seminar recordings used in CLEAR 2007 evaluations. In this paper, we improve the recognition rate of acoustic events using information from both audio and video modalities. First, the acoustic data are processed to obtain both a set of spectrotemporal features and the 3D localization coordinates of the sound source. Second, a number of features are extracted from video recordings by means of object detection, motion analysis, and multicamera person tracking to represent the visual counterpart of several acoustic events. A feature-level fusion strategy is used, and a parallel structure of binary HMM-based detectors is employed in our work. The experimental results show that information from both the microphone array and video cameras is useful to improve the detection rate of isolated as well as spontaneously generated acoustic events.

机译：声音事件检测（AED）旨在确定声音的身份及其在音频信号中的时间位置。当应用于自发生成的声音事件时，仅基于音频信息的AED会显示出大量的错误，这主要是由于时间重叠造成的。实际上，在CLEAR 2007评估中使用的真实世界交互式研讨会录音中，时间重叠占错误的70％以上。在本文中，我们使用来自音频和视频模态的信息来提高声音事件的识别率。首先，处理声学数据以获得一组声时特征和声源的3D定位坐标。其次，通过对象检测，运动分析和多摄像机人员跟踪从视频记录中提取许多功能，以表示多个声音事件的视觉对应。使用了特征级融合策略，并且在我们的工作中采用了基于二进制基于HMM的检测器的并行结构。实验结果表明，来自麦克风阵列和摄像机的信息对于提高隔离声场和自发声事件的检测率很有用。

著录项

来源
《EURASIP journal on advances in signal processing》 |2011年第1期|共页
作者
Taras Butko; Cristian Canton-Ferrer; Carlos Segura; Xavier Gir#243; Climent Nadeu; Javier Hernando; Josep R. Casas;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类通信;
关键词

相似文献

外文文献
中文文献
专利

1. Acoustic event detection based on feature-level fusion of audio and video modalities [J] . Butko T., Canton-Ferrer C., Segura C., EURASIP journal on advances in signal processing . 2011,第20aPta1期

机译：基于音频和视频模态的特征级融合的声音事件检测
2. Detection and separation of speech event using audio and video information fusion and its application to robust speech interface [J] . Asano F, Yamamoto K, Hara I, EURASIP journal on applied signal processing . 2004,第11期

机译：利用音视频信息融合检测和分离语音事件及其在鲁棒语音接口中的应用
3. Detection and Separation of Speech Event Using Audio and Video Information Fusion and Its Application to Robust Speech Interface [J] . Futoshi Asano, Kiyoshi Yamamoto, Isao Hara, EURASIP journal on advances in signal processing . 2004,第11期

机译：基于音视频信息融合的语音事件检测与分离及其在鲁棒语音接口中的应用
4. Fusion of Audio and Video Modalities for Detection of Acoustic Events [C] . Taras Butko, Andrey Temko, Climent Nadeu, International Speech Communication Association . 2008

机译：用于检测声学事件的音频和视频模式的融合
5. Resynthesis of Urban Acoustic Scenes Based on Acoustic Event Detection. [D] . You, Jaeseong. 2017

机译：基于声事件检测的城市声场景再合成。
6. Pulmonary Nodule Detection Model Based on SVM and CT Image Feature-Level Fusion with Rough Sets [O] . Tao Zhou, Huiling Lu, Junjie Zhang, 2006

机译：基于支持向量机和CT图像特征量融合的肺结节检测模型
7. Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities [O] . Taras Butko, Cristian Canton-Ferrer, Carlos Segura, 2011

机译：基于音视频形态特征水平融合的声事件检测

Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities

摘要

著录项

相似文献

相关主题

期刊订阅