Language-Driven Region Pointer Advancement for Controllable Image Captioning

机译：可控图像标题的语言驱动区域指针进步

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Controllable Image Captioning is a recent sub-field in the multi-modal task of Image Captioning wherein constraints are placed on which regions in an image should be described in the generated natural language caption. This puts a stronger focus on producing more detailed descriptions, and opens the door for more end-user control over results. A vital component of the Controllable Image Captioning architecture is the mechanism that decides the timing of attending to each region through the advancement of a region pointer. In this paper, we propose a novel method for predicting the timing of region pointer advancement by treating the advancement step as a natural part of the language structure via a NEXT-token, motivated by a strong correlation to the sentence structure in the training data. We find that our timing agrees with the ground-truth timing in the Flickr30k Entities test data with a precision of 86.55% and a recall of 97.92%. Our model implementing this technique improves the state-of-the-art on standard captioning metrics while additionally demonstrating a considerably larger effective vocabulary size.

机译：可控图像标题是图像标题的多模态任务中的最近子字段，其中应放置在图像中的哪个区域，在生成的自然语言标题中描述。这使得更强大的是产生更详细的描述，并打开门以获得更新用户对结果的控制。可控图像标题架构的重要组成部分是通过区域指针的进步来决定参加每个区域的定时的机制。在本文中，我们提出了一种新的方法，通过将进步步骤作为语言结构的自然部分通过下象来提出一种预测区域指针进步的时序，这是通过与训练数据中的句子结构的强烈相关性的激励。我们发现，我们的时机与Flickr30k实体测试数据中的地面真实时序同意，精度为86.55％，召回为97.92％。我们实现该技术的模型在标准标题指标上提高了最先进的标准，而另外展示了相当大的有效词汇量。

著录项

来源
《International Conference on Computational Linguistics》|2020年|1922-1935|共14页
会议地点
作者
Annika Lindh; Robert Ross; John D. Kelleher;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Exploring region relationships implicitly: Image captioning with visual relationship attention [J] . Zhang Zongjian, Wu Qiang, Wang Yang, Image and Vision Computing . 2021,第May期

机译：隐含地探索区域关系：具有视觉关系的图像标题
2. Image Captioning Using Region-Based Attention Joint with Time-Varying Attention [J] . Wang Weixuan, Hu Haifeng Neural processing letters . 2019,第1期

机译：使用基于区域的注意力联合时变注意力的图像字幕
3. Image Captioning Using Region-Based Attention Joint with Time-Varying Attention [J] . Wang Weixuan, Hu Haifeng Neural processing letters . 2019,第1期

机译：使用基于区域的注意力关节与时变关节的图像标题
4. Mapping between image regions and caption concepts of captioned depictive photographs [C] . Neil C. Rowe AAAI Workshop . 1998

机译：图像区域与标题描绘照片的标题概念之间的映射
5. Novel surface rendering and object registration methods for three-dimensional medical imaging: The "SpiderWeb" surface algorithm and the "Pointers" technique for integrating multimodal images. [D] . Karron, Daniel B. 1993

机译：用于三维医学成像的新颖表面渲染和对象配准方法：用于集成多峰图像的“ SpiderWeb”表面算法和“ Pointers”技术。
6. MON-LB103 Hyperactivation of Reward and Cognitive Control Brain Regions in Response to Food Images in Women Compared to Men With Obesity [O] . Ethiopia D Getachew, Franziska Plessow, Avery Van De Water, 2020

机译：与肥胖的男性相比Mon-Lb103奖励和认知控制脑区的奖励和认知控制脑区响应于女性的食物图像
7. Mapping between image regions and caption concepts of captioned depictive photographs [O] . Rowe Neil C. 1998

机译：字幕描述性照片的图像区域和字幕概念之间的映射

Language-Driven Region Pointer Advancement for Controllable Image Captioning

摘要

著录项

相似文献

相关主题

期刊订阅