首页> 外文期刊>The Visual Computer >Multiple answers to a question: a new approach for visual question answering
【24h】

Multiple answers to a question: a new approach for visual question answering

机译:问题的多个答案:一种新的视觉问题接听方法

获取原文
获取原文并翻译 | 示例
       

摘要

With the advent of deep learning, multi-modal data have been of great interest. One of the multi-modal tasks which can be included in the computer vision domain is visual question answering (VQA). In VQA, a question and an image are entered into the model and the model tries to answer the question according to the image. To the best of our knowledge, the current techniques look at the image and only give one answer to the question asked. However, in some situations, there are several answers to the asked question. In this paper, we address this problem and define a new domain in the task of VQA as well as a new computationally efficient approach to cope with multiple-answer VQA. In this approach, we use a sliding window in an efficient manner to examine the answer to the question in different parts of the image. Due to the fact that so far no proper dataset is available for multiple-answer VQA, we provide a new dataset for evaluating our proposed model. The experiments express that our model uses 94% less operation than other models, making it very suitable for real-time applications.
机译:随着深度学习的出现,多模态数据具有很大的兴趣。可以包含在计算机视觉域中的多模态任务之一是视觉问题应答(VQA)。在VQA中,将一个问题和图像输入到模型中,模型试图根据图像回答问题。据我们所知,目前的技术看着图像,只会给出一个问题。但是,在某些情况下,有几个问题的答案。在本文中,我们解决了这个问题,并在VQA的任务中定义了一个新域,以及一种新的计算有效的方法来应对多答案VQA。在这种方法中,我们以有效的方式使用滑动窗口来检查图像的不同部分中的问题的答案。由于到目前为止,没有适当的数据集可用于多答复VQA,我们提供了一个用于评估我们提出的模型的新数据集。实验表明,我们的模型使用比其他型号更少的操作少94%,使其非常适合实时应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号