首页> 外文会议>International Conference on Computational Linguistics >Language-Driven Region Pointer Advancement for Controllable Image Captioning
【24h】

Language-Driven Region Pointer Advancement for Controllable Image Captioning

机译:可控图像标题的语言驱动区域指针进步

获取原文

摘要

Controllable Image Captioning is a recent sub-field in the multi-modal task of Image Captioning wherein constraints are placed on which regions in an image should be described in the generated natural language caption. This puts a stronger focus on producing more detailed descriptions, and opens the door for more end-user control over results. A vital component of the Controllable Image Captioning architecture is the mechanism that decides the timing of attending to each region through the advancement of a region pointer. In this paper, we propose a novel method for predicting the timing of region pointer advancement by treating the advancement step as a natural part of the language structure via a NEXT-token, motivated by a strong correlation to the sentence structure in the training data. We find that our timing agrees with the ground-truth timing in the Flickr30k Entities test data with a precision of 86.55% and a recall of 97.92%. Our model implementing this technique improves the state-of-the-art on standard captioning metrics while additionally demonstrating a considerably larger effective vocabulary size.
机译:可控图像标题是图像标题的多模态任务中的最近子字段,其中应放置在图像中的哪个区域,在生成的自然语言标题中描述。这使得更强大的是产生更详细的描述,并打开门以获得更新用户对结果的控制。可控图像标题架构的重要组成部分是通过区域指针的进步来决定参加每个区域的定时的机制。在本文中,我们提出了一种新的方法,通过将进步步骤作为语言结构的自然部分通过下象来提出一种预测区域指针进步的时序,这是通过与训练数据中的句子结构的强烈相关性的激励。我们发现,我们的时机与Flickr30k实体测试数据中的地面真实时序同意,精度为86.55%,召回为97.92%。我们实现该技术的模型在标准标题指标上提高了最先进的标准,而另外展示了相当大的有效词汇量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号