【24h】

Integrating Both Visual and Audio Cues for Enhanced Video Caption

机译:为增强型视频标题集成视觉和音频提示

获取原文

摘要

Video caption refers to generating a descriptive sentence for a specific short video clip automatically, which has achieved remarkable success recently. However, most of the existing methods focus more on visual information while ignoring the synchronized audio cues. We propose three multimodal deep fusion strategies to maximize the benefits of visual-audio resonance information. The first one explores the impact on cross-modalities feature fusion from low to high order. The second establishes the visual-audio short-term dependency by sharing weights of corresponding front-end networks. The third extends the temporal dependency to long-term through sharing multimodal memory across visual and audio modalities. Extensive experiments have validated the effectiveness of our three cross-modalities fusion strategies on two benchmark datasets, including Microsoft Research Video to Text (MSRVTT) and Microsoft Video Description (MSVD). It is worth mentioning that sharing weight can coordinate visual-audio feature fusion effectively and achieve the state-of-art performance on both BELU and METEOR metrics. Furthermore, we first propose a dynamic multimodal feature fusion framework to deal with the part modalities missing case. Experimental results demonstrate that even in the audio absence mode, we can still obtain comparable results with the aid of the additional audio modality inference module.
机译:视频标题是指自动为特定的短视频剪辑生成描述性句子,这最近取得了显着的成功。但是,大多数现有方法在忽略同步音频提示时更多地关注可视信息。我们提出了三种多模式深度融合策略,以最大限度地提高视觉音频谐振信息的好处。第一个探讨对低到高阶的跨型号特征融合的影响。第二种通过共享相应的前端网络的权重建立视觉音频短期依赖性。第三个第三,通过在视觉和音频模态共享多模峰内存来扩展时间依赖性。广泛的实验已经验证了我们三个跨多种融合策略对两个基准数据集的有效性,包括Microsoft Research视频到文本(MSRVTT)和Microsoft视频描述(MSVD)。值得一提的是,共享权重可以有效地协调视觉音频功能融合,并在Belu和Meteor指标上实现最先进的性能。此外,我们首先提出了一种动态的多模式特征融合框架来处理缺少案​​例的零件方式。实验结果表明,即使在音频缺失模式中,我们仍然可以通过附加音频模型推断模块获得可比结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号