首页> 外文会议>AAAI Conference on Artificial Intelligence >Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos
【24h】

Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

机译:通过多式联分布语义嵌入视频的零射击事件检测

获取原文

摘要

We propose a new zero-shot Event Detection method by Multi-modal Distributional Semantic embedding of videos. Our model embeds object and action concepts as well as other available modalities from videos into a distributional semantic space. To our knowledge, this is the first Zero-Shot event detection model that is built on top of distributional semantics and extends it in the following directions: (a) semantic embedding of multimodal information in videos (with focus on the visual modalities), (b) automatically determining relevance of concepts/attributes to a free text query, which could be useful for other applications, and (c) retrieving videos by free text event query (e.g., "changing a vehicle tire") based on their content. We embed videos into a distributional semantic space and then measure the similarity between videos and the event query in a free text form. We validated our method on the large TRECVID MED (Multimedia Event Detection) challenge. Using only the event title as a query, our method outperformed the state-of-the-art that uses big descriptions from 12.6% to 13.5% with MAP metric and 0.73 to 0.83 with ROC-AUC metric. It is also an order of magnitude faster.
机译:我们通过多模态分布语义嵌入视频提出了一种新的零射击事件检测方法。我们的模型嵌入了对象和操作概念以及来自视频的其他可用方式进入分配语义空间。为了我们的知识,这是第一个零射击事件检测模型,它基于分布语义而构建,并在以下方向上延伸:(a)在视频中的多模式信息的语义嵌入(重点放在视觉模式上),( b)自动确定概念/属性与自由文本查询的相关性,这对于其他应用程序可能是有用的,而(c)通过自由文本事件查询检索视频(例如,“改变车辆轮胎”)基于它们的内容。我们将视频嵌入分发语义空间,然后以自由文本形式测量视频和事件查询之间的相似性。我们在大型TRECVID MED(多媒体事件检测)挑战上验证了我们的方法。仅使用事件标题作为查询,我们的方法表现出最先进的,它使用12.6%至13.5%的大描述,使用Roc-Auc度量标准和0.73到0.83。它也是一个幅度的数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号