首页> 外文期刊>Advances in multimedia >Real-Time Audio-Visual Analysis for Multiperson Videoconferencing
【24h】

Real-Time Audio-Visual Analysis for Multiperson Videoconferencing

机译:多人视频会议的实时视听分析

获取原文
           

摘要

We describe the design of a system consisting of several state-of-the-art real-time audio and video processing components enabling multimodal stream manipulation (e.g., automatic online editing for multiparty videoconferencing applications) in open, unconstrained environments. The underlying algorithms are designed to allow multiple people to enter, interact, and leave the observable scene with no constraints. They comprise continuous localisation of audio objects and its application for spatial audio object coding, detection, and tracking of faces, estimation of head poses and visual focus of attention, detection and localisation of verbal and paralinguistic events, and the association and fusion of these different events. Combined all together, they represent multimodal streams with audio objects and semantic video objects and provide semantic information for stream manipulation systems (like a virtual director). Various experiments have been performed to evaluate the performance of the system. The obtained results demonstrate the effectiveness of the proposed design, the various algorithms, and the benefit of fusing different modalities in this scenario.
机译:我们描述了一个由几个最新的实时音频和视频处理组件组成的系统的设计,该组件可以在开放,不受限制的环境中进行多模式流操作(例如,用于多方视频会议应用程序的自动在线编辑)。基础算法旨在允许多个人不受限制地进入,交互和离开可观察场景。它们包括音频对象的连续定位及其在空间音频对象的编码,检测和面部跟踪,头部姿势和注意力的视觉焦点的估计,语言和副语言事件的检测和定位以及这些不同的关联和融合方面的应用。事件。它们组合在一起,代表了具有音频对象和语义视频对象的多模式流,并为流操纵系统(如虚拟导演)提供了语义信息。已经进行了各种实验以评估系统的性能。获得的结果证明了所提出设计的有效性,各种算法以及在这种情况下融合不同模式的好处。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号