Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

机译：通过多式联分布语义嵌入视频的零射击事件检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a new zero-shot Event Detection method by Multi-modal Distributional Semantic embedding of videos. Our model embeds object and action concepts as well as other available modalities from videos into a distributional semantic space. To our knowledge, this is the first Zero-Shot event detection model that is built on top of distributional semantics and extends it in the following directions: (a) semantic embedding of multimodal information in videos (with focus on the visual modalities), (b) automatically determining relevance of concepts/attributes to a free text query, which could be useful for other applications, and (c) retrieving videos by free text event query (e.g., "changing a vehicle tire") based on their content. We embed videos into a distributional semantic space and then measure the similarity between videos and the event query in a free text form. We validated our method on the large TRECVID MED (Multimedia Event Detection) challenge. Using only the event title as a query, our method outperformed the state-of-the-art that uses big descriptions from 12.6% to 13.5% with MAP metric and 0.73 to 0.83 with ROC-AUC metric. It is also an order of magnitude faster.

机译：我们通过多模态分布语义嵌入视频提出了一种新的零射击事件检测方法。我们的模型嵌入了对象和操作概念以及来自视频的其他可用方式进入分配语义空间。为了我们的知识，这是第一个零射击事件检测模型，它基于分布语义而构建，并在以下方向上延伸：（a）在视频中的多模式信息的语义嵌入（重点放在视觉模式上），（ b）自动确定概念/属性与自由文本查询的相关性，这对于其他应用程序可能是有用的，而（c）通过自由文本事件查询检索视频（例如，“改变车辆轮胎”）基于它们的内容。我们将视频嵌入分发语义空间，然后以自由文本形式测量视频和事件查询之间的相似性。我们在大型TRECVID MED（多媒体事件检测）挑战上验证了我们的方法。仅使用事件标题作为查询，我们的方法表现出最先进的，它使用12.6％至13.5％的大描述，使用Roc-Auc度量标准和0.73到0.83。它也是一个幅度的数量级。

著录项

来源
《AAAI Conference on Artificial Intelligence》|2016年|3218-3964p|共9页
会议地点
作者
Mohamed Elhoseiny; Jingen Liu; Hui Cheng; Harpreet Sawhney; Ahmed Elgammal;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization [J] . Athanasia Zlatintsi, Petros Koutras, Georgios Evangelopoulos, EURASIP journal on image and video processing . 2017,第1期

机译：COGNIMUSE：一个多模态视频数据库，带有显着性，事件，语义和情感注释，并应用于摘要
2. Smoke detection in endoscopic surgery videos: a first step towards retrieval of semantic events [J] . Loukas Constantinos, Georgiou Evangelos The international journal of medical robotics + computer assisted surgery: MRCAS . 2015,第1期

机译：内窥镜手术视频中的烟雾检测：检索语义事件的第一步
3. Explicit semantic events detection and development of realistic applications for broadcasting baseball videos [J] . Wei-Ta Chu, Ja-Ling Wu Multimedia Tools and Applications . 2008,第1期

机译：明确的语义事件检测和现实应用的开发，以广播棒球视频
4. Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos [C] . Mohamed Elhoseiny, Jingen Liu, Hui Cheng, AAAI Conference on Artificial Intelligence . 2016

机译：通过多式联分布语义嵌入视频的零射击事件检测
5. Explainability with Semantic Concept Composition and Zero-Shot Learning for Anomaly Detection [D] . Bendre, Nihar Shrikant. 2021

机译：用语义概念组成和对异常检测的零射击学习的解释性
6. Semantic Pooling for Complex Event Analysis in Untrimmed Videos [O] . Xiaojun Chang, Yao-Liang Yu, Yi Yang, -1

机译：未修饰视频中复杂事件分析的语义池
7. Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zero-shot Classification and Retrieval of Videos [O] . Kranti Kumar Parida, Neeraj Matiyali, Tanaya Guha, 2020

机译：用于广义视听零拍分类和视频的协调联合多模式嵌入

Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

摘要

著录项

相似文献

相关主题

期刊订阅