An Effective Dense Co-Attention Networks for Visual Question Answering

机译：用于视觉问题的有效密集的联合网络

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

At present, the state-of-the-art approaches of Visual Question Answering (VQA) mainly use the co-attention model to relate each visual object with text objects, which can achieve the coarse interactions between multimodalities. However, they ignore the dense self-attention within question modality. In order to solve this problem and improve the accuracy of VQA tasks, in the present paper, an effective Dense Co-Attention Networks (DCAN) is proposed. First, to better capture the relationship between words that are relatively far apart and make the extracted semantics more robust, the Bidirectional Long Short-Term Memory (Bi-LSTM) neural network is introduced to encode questions and answers; second, to realize the fine-grained interactions between the question words and image regions, a dense multimodal co-attention model is proposed. The model’s basic components include the self-attention unit and the guided-attention unit, which are cascaded in depth to form a hierarchical structure. The experimental results on the VQA-v2 dataset show that DCAN has obvious performance advantages, which makes VQA applicable to a wider range of AI scenarios.

机译：目前，视觉问题的最先进的视觉问题方法（VQA）主要使用共同关注模型与文本对象相关联，可以实现多重差异之间的粗略相互作用。然而，他们忽略了质量致密的自我关注。为了解决这个问题并提高VQA任务的准确性，在本文中，提出了一种有效的密集的共注意网络（DCAN）。首先，为了更好地捕获相对较远的单词之间的关系并使提取的语义更加强大，引入了双向长期存储器（Bi-LSTM）神经网络以编码问题和答案;其次，为了实现问题词和图像区域之间的细粒度相互作用，提出了一种密集的多模式共同关注模型。该模型的基本组件包括自我注意单元和引导关注单元，其深度级联以形成层级结构。 VQA-V2数据集上的实验结果表明，DCAN具有明显的性能优势，使VQA适用于更广泛的AI场景。

著录项

期刊名称 Sensors (Basel Switzerland)
作者
Shirong He; Dezhi Han;
展开▼
作者单位

展开▼
年(卷),期 2020(20),17
年度 2020
页码 4897
总页数 15
原文格式 PDF
正文语种
中图分类
关键词
visual question answering; dense co-attention network; Bi-LSTM; deep learning; natural language processing; computer vision;

机译：视觉问题应答;密集的共同网络;Bi-LSTM;深入学习;自然语言处理;计算机愿景;

相似文献

外文文献
中文文献
专利

1. Multi-Tier Attention Network using Term-weighted Question Features for Visual Question Answering [J] . Manmadhan Sruthy, Kovoor Binsu C. Image and Vision Computing . 2021,第Nova期

机译：使用术语加权问题的多层关注网络，用于视觉问题应答
2. Object-difference drived graph convolutional networks for visual question answering [J] . Zhu Xi, Mao Zhendong, Chen Zhineng, Multimedia Tools and Applications . 2021,第11期

机译：对象差异驱动的图表卷积网络，用于视觉问题应答
3. Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? [J] . Abhishek Das, Harsh Agrawal, Larry Zitnick, Computer vision and image understanding . 2017,第octa期

机译：视觉问题解答中的人类注意力：人类和深层网络是否看待同一地区？
4. Improved Fusion of Visual and Language Representations by Dense Symmetric Co-attention for Visual Question Answering [C] . Duy-Kien Nguyen, Takayuki Okatani IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2018

机译：密集的对称共同注意对视觉问题的回答，改善了视觉和语言表示的融合
5. Inferring answer quality, answerer expertise, and ranking in question answer social networks. [D] . Cai, Yuanzhe. 2014

机译：推断回答质量，回答者专业知识以及对问题进行回答的社交网络的排名。
6. Multi-Modal Explicit Sparse Attention Networks for Visual Question Answering [O] . Zihan Guo, Dezhi Han 2020

机译：用于视觉问题的多模态显式稀疏关注网络
7. Improved Fusion of Visual and Language Representations by Dense Symmetric Co-attention for Visual Question Answering [O] . Duy-Kien Nguyen, Takayuki Okatani 2018

机译：通过密集的对称关注改进了视觉和语言表示的融合，以了解视觉问题
8. Learning Strategy Training Program: Questions and Answers for Effective Learning. [R] . Dansereau, D. F., Long, G. L., McDonald, B. A., 1975

机译：学习策略培训计划：有效学习的问题和答案。

An Effective Dense Co-Attention Networks for Visual Question Answering

摘要

著录项

相似文献

相关主题

期刊订阅