首页> 外文会议>International conference on artificial neural networks >Hybrid Attention Driven Text-to-image Synthesis via Generative Adversarial Networks
【24h】

Hybrid Attention Driven Text-to-image Synthesis via Generative Adversarial Networks

机译:通过生成对抗网络的混合注意力驱动文本到图像合成

获取原文

摘要

With the development of generative models, image synthesis conditioned on the specific variable becomes an important research theme gradually. This paper presents a novel spectral normalization based Hybrid Attentional Generative Adversarial Networks (HAGAN) for text to image synthesis. The hybrid attentional mechanism is composed of text-image cross-modal attention and self-attention of image sub regions. Cross-modal attention mechanism contributes to synthesize more fine-grained and text-related image by introducing word-level semantic information in generative model. The self-attention solves the long distance reliance of image local-region features when generate image. With spectral normalization, the training of GANs become more stable than traditional GANs, which conduces to avoid model collapse and gradient vanishing or explosion. We conduct experiments on widely used Oxford-102 flower dataset and CUB bird dataset to validate our proposed method. During quantitative and non-quantitative experimental comparison, the results indicate that the proposed method achieves the best performance on Inception score (IS), Frechet Inception Distance (FID) and visual effect.
机译:随着生成模型的发展,基于特定变量的图像合成逐渐成为重要的研究主题。本文提出了一种基于光谱归一化的新型混合注意力生成对抗网络(HAGAN),用于文本到图像的合成。混合注意机制由文本图像交叉模式注意和图像子区域的自注意组成。跨模式注意机制通过在生成模型中引入单词级别的语义信息,有助于合成更细粒度且与文本相关的图像。自注意力解决了生成图像时图像局部区域特征的长距离依赖。通过频谱归一化,GAN的训练变得比传统GAN更稳定,这有助于避免模型崩溃和梯度消失或爆炸。我们对广泛使用的Oxford-102花卉数据集和CUB鸟类数据集进行了实验,以验证我们提出的方法。在定量和非定量实验比较中,结果表明该方法在初始得分(IS),Frechet初始距离(FID)和视觉效果方面均达到了最佳性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号