New Ideas and Trends in Deep Multimodal Content Understanding:A Review

Chen Wei; Wang Weiping; Liu Li; Lew Michael S.

首页> 外文期刊>Neurocomputing >New Ideas and Trends in Deep Multimodal Content Understanding:A Review

【24h】

New Ideas and Trends in Deep Multimodal Content Understanding:A Review

机译：深层多模态内容理解的新思路和趋势：综述

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and text. Unlike classic reviews of deep learning where monomodal image classifiers such as VGG, ResNet and Inception module are central topics, this paper will examine recent multimodal deep models and structures, including auto-encoders, generative adversarial nets and their variants. These models go beyond the simple image classifiers in which they can do uni-directional (e.g. image captioning, image generation) and bi-directional (e.g. cross-modal retrieval, visual question answering) multimodal tasks. Besides, we analyze two aspects of the challenge in terms of better content understanding in deep multimodal applications. We then introduce current ideas and trends in deep multimodal feature learning, such as feature embedding approaches and objective function design, which are crucial in overcoming the aforementioned challenges. Finally, we include several promising directions for future research. (C) 2020 The Authors. Published by Elsevier B.V.

机译：本调查的重点是分析了两种多模式深度学习方式：图像和文本。与经典审查的深度学习，其中Monododal图像分类器如VGG，Reset和Inception模块是中央主题，本文将研究最近的多模式深度模型和结构，包括自动编码器，生成对抗网及其变体。这些模型超出了它们可以进行单向的简单图像分类器（例如图像标题，图像生成）和双向（例如跨模型检索，视觉问题应答）多模式任务。此外，我们在深层多模式应用中的更好内容理解方面分析了挑战的两个方面。然后，我们介绍了深度多模式特征学习中的当前想法和趋势，例如特征嵌入方法和客观函数设计，这在克服上述挑战方面是至关重要的。最后，我们包括几个有希望的未来研究方向。（c）2020作者。由elsevier b.v出版。

著录项

来源
《Neurocomputing》 |2021年第22期|195-215|共21页
作者
Chen Wei; Wang Weiping; Liu Li; Lew Michael S.;
展开▼
作者单位

Leiden Univ LIACS NL-2333 CA Leiden Netherlands;

NUDT Coll Syst Engn Changsha 410073 Peoples R China;

NUDT Coll Syst Engn Changsha 410073 Peoples R China|Univ Oulu Ctr Machine Vis & Signal Anal Oulu Finland;

Leiden Univ LIACS NL-2333 CA Leiden Netherlands;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Multimodal deep learning; Ideas and trends; Content understanding; Literature review;

机译：多模式深度学习;想法和趋势;内容理解;文献综述;

New Ideas and Trends in Deep Multimodal Content Understanding:A Review

摘要

著录项

相关主题

期刊订阅