Video captioning with global and local text attention

Yuqing Peng; Chenxi Wang; Yixin PeiYingjun Li

首页> 外文期刊>The visual computer >Video captioning with global and local text attention

【24h】

Video captioning with global and local text attention

机译：Video captioning with global and local text attention

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

Abstract The task of video captioning is to generate a video description corresponding to the video content, so there are stringent requirements for the extraction of fine-grained video features and the language processing of tag text. A new method using global control of the text and local strengthening during training is proposed in this paper. In this method, the context can be referred to when the training generates text. In addition, more attention is given to important words in the text, such as nouns and predicate verbs, and this approach greatly improves the recognition of objects and provides more accurate prediction of actions in the video. Moreover, in this paper, the authors adopt 2D and 3D multimodal feature extraction for the process of video feature extraction. Better results are achieved by the fine-grained feature capture of global attention and the fusion of bidirectional time flow. The method in this paper obtains good results on both the MSR-VTT and MSVD datasets.

著录项

来源
《The visual computer》 |2022年第12期|4267-4278|共12页
作者
Yuqing Peng; Chenxi Wang; Yixin PeiYingjun Li;
展开▼
作者单位

Hebei University of Technology;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种英语
中图分类
关键词
Video captioning; Global control; Local strengthening; Bidirectional;

Video captioning with global and local text attention

摘要

著录项

相关主题

期刊订阅