Watch Hours in Minutes: Summarizing Videos with User Intent

机译：在几分钟内观看时间：通过用户意图汇总视频

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the ever increasing growth of videos, automatic video summarization has become an important task which has attracted lot of interest in the research community. One of the challenges which makes it a hard problem to solve is presence of multiple 'correct answers'. Because of the highly subjective nature of the task, there can be different "ideal" summaries of a video. Modelling user intent in the form of queries has been posed in literature as a way to alleviate this problem. The query-focused summary is expected to contain shots which are relevant to the query in conjunction with other important shots. For practical deployments in which very long videos need to be summarized, this need to capture user's intent becomes all the more pronounced. In this work, we propose a simple two stage method which takes user query and video as input and generates a query-focused summary. Specifically, in the first stage, we employ attention within a segment and across all segments, combined with the query to learn the feature representation of each shot. In the second stage, such learned features are again fused with the query to learn the score of each shot by regressing through fully connected layers. We then assemble the summary by arranging the top scoring shots in chronological order. Extensive experiments on a benchmark query-focused video summarization dataset for long videos give better results as compared to the current state of the art, thereby demonstrating the effectiveness of our method even without employing computationally expensive architectures like LSTMs, variational autoencoders, GANs or reinforcement learning, as done by most past works.

机译：随着视频的增长增长，自动视频摘要已成为对研究界感兴趣的重要任务。解决问题的挑战之一是解决问题的难题是存在多重“正确答案”。由于任务的高度主观性质，可以有不同的“理想”视频摘要。以查询形式的建模用户意图已经在文献中构成，作为缓解此问题的方式。预计将查询摘要将包含与其他重要镜头相结合的查询相关的镜头。对于实际部署，其中需要总结很长的视频，这需要捕获用户的意图变得更加明显。在这项工作中，我们提出了一种简单的两个阶段方法，将用户查询和视频作为输入，并生成偏心摘要。具体而言，在第一阶段，我们在段内和跨所有段内的注意力，结合查询来学习每个镜头的特征表示。在第二阶段，这种学习的功能再次与查询融合，以通过通过完全连接的层回归每个镜头的分数。然后，我们通过按时间顺序排列最高评分镜头来组装摘要。对于长视频的基准查询视频摘要数据集进行了广泛的实验，与本领域的当前状态相比，表明即使在不采用LSTMS，变分性自动码，GAN或加强学习等计算昂贵的架构的情况下，也表明了我们方法的有效性，大多数过去的作品所做的那样。

著录项

来源
《European conference on computer vision》|2020年|714-730|共17页
会议地点
作者
Saiteja Nalla; Mohit Agrawal; Vishal Kaushal; Ganesh Ramakrishnan; Rishabh Iyer;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Query-focused; Video summarization; Attention; User-intent;

机译：查询焦点;视频摘要;注意力;用户意图;

相似文献

外文文献
中文文献
专利

1. User experience on mobile video appreciation: How to engross users and to enhance their enjoyment in watching mobile video clips [J] . Eric W.K. See-To, Sawas Papagiannidis, Vincent Cho Technological forecasting and social change . 2012,第8期

机译：用户对移动视频欣赏的体验：如何吸引用户并增强他们在观看移动视频片段中的乐趣
2. Video Summarization Using Event-Related Potential Responses to Shot Boundaries in Real-Time Video Watching [J] . Kim Hyun Hee, Kim Yong Ho Journal of the American Society for Information Science and Technology . 2019,第2期

机译：在视频实时观看中，使用事件相关电位对镜头边界的视频摘要
3. Read, Watch, Listen, and Summarize: Multi-Modal Summarization for Asynchronous Text, Image, Audio and Video [J] . Li Haoran, Zhu Junnan, Ma Cong, IEEE Transactions on Knowledge and Data Engineering . 2019,第5期

机译：阅读，观看，收听和汇总：异步文本，图像，音频和视频的多模式汇总
4. Users Inventing Ways To Enjoy New Mobile Services - The Case of Watching Mobile Videos [C] . Petteri Repo, Kaarina Hyvonen, Mika Pantzar, Annual Hawaii International Conference on System Sciences . 2004

机译：用户发明方法享受新的移动服务 - 观看移动视频的情况
5. Video indexing and summarization service for mobile users. [D] . Ahmed, Mohamed Ali. 2002

机译：针对移动用户的视频索引和摘要服务。
6. How to Summarize a 6000-Word Paper in a Six-Minute Video Clip [O] . Pascale Lehoux, Patrick Vachon, Genevieve Daudelin, 2013

机译：如何在6分钟的视频剪辑中总结6000字的论文
7. User experience on mobile video appreciation : how to engross users and to enhance their enjoyment in watching mobile video clips [O] . See-To EWK, Papagiannidis S, Cho V 2012

机译：用户对移动视频欣赏的体验：如何吸引用户并增强他们在观看移动视频片段中的享受
8. Automatic Keyframe Summarization of User-Generated Video [R] . Eckstrand, E C 2014

机译：用户生成视频的自动关键帧摘要

Watch Hours in Minutes: Summarizing Videos with User Intent

摘要

著录项

相似文献

相关主题

期刊订阅