ArCo: Attention-reinforced transformer with contrastive learning for image captioning

Wang Zhongan; Shi Shuai; Zhai ZirongWu YingnaYang Rui

首页> 外文期刊>Image and vision computing >ArCo: Attention-reinforced transformer with contrastive learning for image captioning

【24h】

ArCo: Attention-reinforced transformer with contrastive learning for image captioning

机译：ArCo: Attention-reinforced transformer with contrastive learning for image captioning

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

Image captioning is a significant step toward achieving automatic interactions between humans and com-puters, in which a textual sequence of the content of an image is generated. Recently, the transformer-based encoder-decoder paradigm has made great achievements in image captioning. This method is usually trained with a cross-entropy loss function. However, for various captions of images with the same meaning, the computed losses may be different. The result is that the descriptions of images tend to be consistent, which limits the diversity of image captioning. In this paper, we present an attention-reinforced trans-former, a transformer-based architecture for image captioning. The architecture improves the image encoding stage, which exploits the relationships between image regions by integrating a feature attention block (FAB). During the training phase, we trained the model with a combination of cross-entropy loss and contrastive loss. We experimentally explored the performance of ArCo and other fully attentive models. We also validated the baseline of the transformer for image captioning with different pre-trained models. Our proposed approach was demonstrated to achieve a new state-of-the-art performance on the offline 'Karpathy' test split and online test server.(c) 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http:// creativecommons.org/licenses/by/4.0/).

著录项

来源
《Image and vision computing》 |2022年第12期|104570.1-104570.9|共9页
作者
Wang Zhongan; Shi Shuai; Zhai ZirongWu YingnaYang Rui;
展开▼
作者单位

ShanghaiTech Univ;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种英语
中图分类
关键词
Image captioning; Visual attention; Transformer; Contrastive learning;

ArCo: Attention-reinforced transformer with contrastive learning for image captioning

摘要

著录项

相关主题

期刊订阅