Learning Disentangled Representation in Latent Stochastic Models: A Case Study with Image Captioning

机译：学习潜在随机模型中的解开表示：以图像标题为例

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Multimodal tasks require learning joint representation across modalities. In this paper, we present an approach to employ latent stochastic models for a multimodal task image captioning. Encoder Decoder models with stochastic latent variables are often faced with optimization issues such as latent collapse preventing them from realizing their full potential of rich representation learning and disentanglement. We present an approach to train such models by incorporating joint continuous and discrete representation in the prior distribution. We evaluate the performance of proposed approach on a multitude of metrics against vanilla latent stochastic models. We also perform a qualitative assessment and observe that the proposed approach indeed has the potential to learn composite information and explain novel combinations not seen in the training data.

机译：多模式任务需要跨多种方式学习联合代表。在本文中，我们介绍了一种采用多模式任务图像标题的潜在随机模型的方法。具有随机潜变量的编码器解码器模型通常面临优化问题，例如潜在的折叠，防止它们实现其丰富的代表学习和解剖学的全部潜力。我们通过在先前分配中包含联合连续和离散表示来培养这种模型的方法。我们评估了对Vanilla潜在随机模型众多指标对拟议方法的表现。我们还表现了定性评估，并遵守拟议的方法确实有可能学习综合信息并解释培训数据中未见的新组合。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2019年|p4000-4663|共5页
会议地点
作者
Nidhi Vyas; SaiKrishna Rallabandi; Lalitesh Morishetti; Eduard Hovy; Alan W Black;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
disentanglement; latent representation; captioning; composition; multimodal; continuous; discrete;

机译：解剖学;潜在的代表;标题;组成;多式联版;连续;离散;

相似文献

外文文献
中文文献
专利

1. Attention-Based Deep Learning Model for Image Captioning: A Comparative Study [J] . Phyu Phyu Khaing, May The` Yu International Journal of Image, Graphics and Signal Processing . 2019,第6期

机译：基于注意力的深度学习图像字幕模型的比较研究
2. Colorizing and Captioning Images Using Deep Learning Models and Deploying Them Via loT Deployment Tools [J] . Krishnamurthi Rajalakshmi, Maheshwari Raghav, Gulati Rishabh International journal of information retrieval research . 2020,第4期

机译：使用深度学习模型并通过批次部署工具部署它们的彩色和标题图像
3. Disentangled representation learning in cardiac image analysis [J] . Chartsias Agisilaos, Joyce Thomas, Papanastasiou Giorgos, Medical image analysis . 2019,第期

机译：心脏图像分析中的解开表示学习
4. Learning Disentangled Representation in Latent Stochastic Models: A Case Study with Image Captioning [C] . Nidhi Vyas, SaiKrishna Rallabandi, Lalitesh Morishetti, IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：学习潜在随机模型中的纠缠表示：带图像字幕的案例研究
5. Latent Representation Learning for Understanding Relationships in Computer Vision and Neuroimaging [D] . Hwang, Seong Jae. 2019

机译：潜在代表学习理解计算机视觉和神经影像中的关系
6. A Neural Correlate of Reward-Based Behavioral Learning in Caudate Nucleus: A Functional Magnetic Resonance Imaging Study of a Stochastic Decision Task [O] . Masahiko Haruno, Tomoe Kuroda, Kenji Doya, 2004

机译：尾状核中基于奖励的行为学习的神经相关性：随机决策任务的功能磁共振成像研究。
7. Learning visual representations using images with captions [O] . Ariadna Quattoni, Michael Collins, Trevor Darrell 2008

机译：使用带有标题的图像学习视觉表示

Learning Disentangled Representation in Latent Stochastic Models: A Case Study with Image Captioning

摘要

著录项

相似文献

相关主题

期刊订阅