首页> 外文期刊>EURASIP journal on advances in signal processing >Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities
【24h】

Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities

机译:基于音视频形态特征水平融合的声事件检测

获取原文
           

摘要

Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in audio signals. When applied to spontaneously generated acoustic events, AED based only on audio information shows a large amount of errors, which are mostly due to temporal overlaps. Actually, temporal overlaps accounted for more than 70% of errors in the real-world interactive seminar recordings used in CLEAR 2007 evaluations. In this paper, we improve the recognition rate of acoustic events using information from both audio and video modalities. First, the acoustic data are processed to obtain both a set of spectrotemporal features and the 3D localization coordinates of the sound source. Second, a number of features are extracted from video recordings by means of object detection, motion analysis, and multicamera person tracking to represent the visual counterpart of several acoustic events. A feature-level fusion strategy is used, and a parallel structure of binary HMM-based detectors is employed in our work. The experimental results show that information from both the microphone array and video cameras is useful to improve the detection rate of isolated as well as spontaneously generated acoustic events.
机译:声音事件检测(AED)旨在确定声音的身份及其在音频信号中的时间位置。当应用于自发生成的声音事件时,仅基于音频信息的AED会显示出大量的错误,这主要是由于时间重叠造成的。实际上,在CLEAR 2007评估中使用的真实世界交互式研讨会录音中,时间重叠占错误的70%以上。在本文中,我们使用来自音频和视频模态的信息来提高声音事件的识别率。首先,处理声学数据以获得一组声时特征和声源的3D定位坐标。其次,通过对象检测,运动分析和多摄像机人员跟踪从视频记录中提取许多功能,以表示多个声音事件的视觉对应。使用了特征级融合策略,并且在我们的工作中采用了基于二进制基于HMM的检测器的并行结构。实验结果表明,来自麦克风阵列和摄像机的信息对于提高隔离声场和自发声事件的检测率很有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号