首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Learning Disentangled Representation in Latent Stochastic Models: A Case Study with Image Captioning
【24h】

Learning Disentangled Representation in Latent Stochastic Models: A Case Study with Image Captioning

机译:学习潜在随机模型中的解开表示:以图像标题为例

获取原文

摘要

Multimodal tasks require learning joint representation across modalities. In this paper, we present an approach to employ latent stochastic models for a multimodal task image captioning. Encoder Decoder models with stochastic latent variables are often faced with optimization issues such as latent collapse preventing them from realizing their full potential of rich representation learning and disentanglement. We present an approach to train such models by incorporating joint continuous and discrete representation in the prior distribution. We evaluate the performance of proposed approach on a multitude of metrics against vanilla latent stochastic models. We also perform a qualitative assessment and observe that the proposed approach indeed has the potential to learn composite information and explain novel combinations not seen in the training data.
机译:多模式任务需要跨多种方式学习联合代表。在本文中,我们介绍了一种采用多模式任务图像标题的潜在随机模型的方法。具有随机潜变量的编码器解码器模型通常面临优化问题,例如潜在的折叠,防止它们实现其丰富的代表学习和解剖学的全部潜力。我们通过在先前分配中包含联合连续和离散表示来培养这种模型的方法。我们评估了对Vanilla潜在随机模型众多指标对拟议方法的表现。我们还表现了定性评估,并遵守拟议的方法确实有可能学习综合信息并解释培训数据中未见的新组合。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号