首页> 美国卫生研究院文献>Sensors (Basel Switzerland) >An Effective Dense Co-Attention Networks for Visual Question Answering
【2h】

An Effective Dense Co-Attention Networks for Visual Question Answering

机译:用于视觉问题的有效密集的联合网络

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

At present, the state-of-the-art approaches of Visual Question Answering (VQA) mainly use the co-attention model to relate each visual object with text objects, which can achieve the coarse interactions between multimodalities. However, they ignore the dense self-attention within question modality. In order to solve this problem and improve the accuracy of VQA tasks, in the present paper, an effective Dense Co-Attention Networks (DCAN) is proposed. First, to better capture the relationship between words that are relatively far apart and make the extracted semantics more robust, the Bidirectional Long Short-Term Memory (Bi-LSTM) neural network is introduced to encode questions and answers; second, to realize the fine-grained interactions between the question words and image regions, a dense multimodal co-attention model is proposed. The model’s basic components include the self-attention unit and the guided-attention unit, which are cascaded in depth to form a hierarchical structure. The experimental results on the VQA-v2 dataset show that DCAN has obvious performance advantages, which makes VQA applicable to a wider range of AI scenarios.
机译:目前,视觉问题的最先进的视觉问题方法(VQA)主要使用共同关注模型与文本对象相关联,可以实现多重差异之间的粗略相互作用。然而,他们忽略了质量致密的自我关注。为了解决这个问题并提高VQA任务的准确性,在本文中,提出了一种有效的密集的共注意网络(DCAN)。首先,为了更好地捕获相对较远的单词之间的关系并使提取的语义更加强大,引入了双向长期存储器(Bi-LSTM)神经网络以编码问题和答案;其次,为了实现问题词和图像区域之间的细粒度相互作用,提出了一种密集的多模式共同关注模型。该模型的基本组件包括自我注意单元和引导关注单元,其深度级联以形成层级结构。 VQA-V2数据集上的实验结果表明,DCAN具有明显的性能优势,使VQA适用于更广泛的AI场景。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号