...
首页> 外文期刊>Neurocomputing >New Ideas and Trends in Deep Multimodal Content Understanding:A Review
【24h】

New Ideas and Trends in Deep Multimodal Content Understanding:A Review

机译:深层多模态内容理解的新思路和趋势:综述

获取原文
获取原文并翻译 | 示例
           

摘要

The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and text. Unlike classic reviews of deep learning where monomodal image classifiers such as VGG, ResNet and Inception module are central topics, this paper will examine recent multimodal deep models and structures, including auto-encoders, generative adversarial nets and their variants. These models go beyond the simple image classifiers in which they can do uni-directional (e.g. image captioning, image generation) and bi-directional (e.g. cross-modal retrieval, visual question answering) multimodal tasks. Besides, we analyze two aspects of the challenge in terms of better content understanding in deep multimodal applications. We then introduce current ideas and trends in deep multimodal feature learning, such as feature embedding approaches and objective function design, which are crucial in overcoming the aforementioned challenges. Finally, we include several promising directions for future research. (C) 2020 The Authors. Published by Elsevier B.V.
机译:本调查的重点是分析了两种多模式深度学习方式:图像和文本。与经典审查的深度学习,其中Monododal图像分类器如VGG,Reset和Inception模块是中央主题,本文将研究最近的多模式深度模型和结构,包括自动编码器,生成对抗网及其变体。这些模型超出了它们可以进行单向的简单图像分类器(例如图像标题,图像生成)和双向(例如跨模型检索,视觉问题应答)多模式任务。此外,我们在深层多模式应用中的更好内容理解方面分析了挑战的两个方面。然后,我们介绍了深度多模式特征学习中的当前想法和趋势,例如特征嵌入方法和客观函数设计,这在克服上述挑战方面是至关重要的。最后,我们包括几个有希望的未来研究方向。 (c)2020作者。由elsevier b.v出版。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号