Learning Answer Embeddings for Visual Question Answering

机译：学习答案嵌入用于视觉问题的回答

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a novel probabilistic model for visual question answering (Visual QA). The key idea is to infer two sets of embeddings: one for the image and the question jointly and the other for the answers. The learning objective is to learn the best parameterization of those embeddings such that the correct answer has higher likelihood among all possible answers. In contrast to several existing approaches of treating Visual QA as multi-way classification, the proposed approach takes the semantic relationships (as characterized by the embeddings) among answers into consideration, instead of viewing them as independent ordinal numbers. Thus, the learned embedded function can be used to embed unseen answers (in the training dataset). These properties make the approach particularly appealing for transfer learning for open-ended Visual QA, where the source dataset on which the model is learned has limited overlapping with the target dataset in the space of answers. We have also developed large-scale optimization techniques for applying the model to datasets with a large number of answers, where the challenge is to properly normalize the proposed probabilistic models. We validate our approach on several Visual QA datasets and investigate its utility for transferring models across datasets. The empirical results have shown that the approach performs well not only on in-domain learning but also on transfer learning.

机译：我们提出了一种新颖的视觉问题答案（Visual QA）的概率模型。关键的想法是推断两组嵌入式：一个用于图像和一个问题，并为答案而联合。学习目标是学习这些嵌入的最佳参数化，使得正确答案在所有可能的答案中具有更高的可能性。与对待视觉QA的几种现有方法相比，所提出的方法考虑了答案中的语义关系（如嵌入的特征），而不是将它们视为独立的序数。因此，所学习的嵌入功能可用于嵌入不间断的答案（在训练数据集中）。这些属性使得对开放式视觉QA的转移学习进行特别吸引力的方法，其中学习模型的源数据集与答案空间中的目标数据集有限。我们还开发了大规模优化技术，用于将模型应用于具有大量答案的数据集，其中挑战是正确归一化所提出的概率模型。我们在几个Visual QA数据集中验证了我们的方法，并调查其实用程序以将模型转移到数据集中。经验结果表明，该方法不仅在域名学习中表现良好，而且表现不仅要转移学习。

著录项

来源
《IEEE/CVF Conference on Computer Vision and Pattern Recognition》|2018年|731p|共9页
会议地点
作者
Fei Sha; Wei-Lun Chao; Hexiang Hu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.41-53;
关键词
Visualization; Semantics; Probabilistic logic; Computational modeling; Task analysis; Training; Adaptation models;

机译：可视化;语义;概率逻辑;计算建模;任务分析;培训;适应模型;

相似文献

外文文献
中文文献
专利

1. Learning to Recognize Visual Concepts for Visual Question Answering With Structural Label Space [J] . Gao Difei, Wang Ruiping, Shan Shiguang, Selected Topics in Signal Processing, IEEE Journal of . 2020,第3期

机译：学习识别视觉概念的视觉概念与结构标签空间应答
2. R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering [J] . Pan Lu, Lei Ji, Wei Zhang, SIGKDD explorations . 2018,第Udisk期

机译：R-VQA：学习具有语义关注的视觉关系事实，用于视觉问题应答
3. Multiple answers to a question: a new approach for visual question answering [J] . Hosseinabad Sayedshayan Hashemi, Safayani Mehran, Mirzaei Abdolreza The Visual Computer . 2021,第1期

机译：问题的多个答案：一种新的视觉问题接听方法
4. Learning Answer Embeddings for Visual Question Answering [C] . Fei Sha, Wei-Lun Chao, Hexiang Hu IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2018

机译：学习视觉视觉答案的答案嵌入
5. An Analysis of Bottom-Up Attention Models and Multimodal Representation Learning for Visual Question Answering [D] . Narayanan, Venkatraman . 2019

机译：视觉问题应答的自下而上关注模型和多式联表学习分析
6. An Effective Dense Co-Attention Networks for Visual Question Answering [O] . Shirong He, Dezhi Han 2020

机译：用于视觉问题的有效密集的联合网络
7. Learning Answer Embeddings for Visual Question Answering [O] . Fei Sha, Wei-Lun Chao, Hexiang Hu 2018

机译：学习答案嵌入用于视觉问题的回答
8. Learning Strategy Training Program: Questions and Answers for Effective Learning. [R] . Dansereau, D. F., Long, G. L., McDonald, B. A., 1975

机译：学习策略培训计划：有效学习的问题和答案。

Learning Answer Embeddings for Visual Question Answering

摘要

著录项

相似文献

相关主题

期刊订阅