首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >Multimodal Explanations: Justifying Decisions and Pointing to the Evidence
【24h】

Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

机译:多式联运的解释:做出合理的决定并指向证据

获取原文

摘要

Deep models that are both effective and explainable are desirable in many settings; prior explainable models have been unimodal, offering either image-based visualization of attention weights or text-based generation of post-hoc justifications. We propose a multimodal approach to explanation, and argue that the two modalities provide complementary explanatory strengths. We collect two new datasets to define and evaluate this task, and propose a novel model which can provide joint textual rationale generation and attention visualization. Our datasets define visual and textual justifications of a classification decision for activity recognition tasks (ACT-X) and for visual question answering tasks (VQA-X). We quantitatively show that training with the textual explanations not only yields better textual justification models, but also better localizes the evidence that supports the decision. We also qualitatively show cases where visual explanation is more insightful than textual explanation, and vice versa, supporting our thesis that multimodal explanation models offer significant benefits over unimodal approaches.
机译:在许多情况下都需要有效且可解释的深度模型。先前的可解释模型是单峰的,可以提供基于图像的注意权重可视化或基于文本的事后证明生成。我们提出了一种解释的多模式方法,并认为这两种模式提供了互补的解释优势。我们收集了两个新的数据集来定义和评估此任务,并提出了一个新颖的模型,该模型可以提供联合的文本基本原理生成和注意力可视化。我们的数据集定义了针对活动识别任务(ACT-X)和可视问题回答任务(VQA-X)的分类决策的视觉和文本依据。我们定量地表明,使用文本解释进行的训练不仅可以产生更好的文本辩护模型,而且可以更好地定位支持该决定的证据。我们还定性地显示了视觉解释比文本解释更具有洞察力的情况,反之亦然,这支持了我们的观点,即多模式解释模型比单模式方法具有明显优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号