Learning multi-modal generative models with permutation-invariant encoders and tighter variational bounds
arxiv(2023)
摘要
Devising deep latent variable models for multi-modal data has been a
long-standing theme in machine learning research. Multi-modal Variational
Autoencoders (VAEs) have been a popular generative model class that learns
latent representations that jointly explain multiple modalities. Various
objective functions for such models have been suggested, often motivated as
lower bounds on the multi-modal data log-likelihood or from
information-theoretic considerations. To encode latent variables from different
modality subsets, Product-of-Experts (PoE) or Mixture-of-Experts (MoE)
aggregation schemes have been routinely used and shown to yield different
trade-offs, for instance, regarding their generative quality or consistency
across multiple modalities. In this work, we consider a variational bound that
can tightly approximate the data log-likelihood. We develop more flexible
aggregation schemes that generalize PoE or MoE approaches by combining encoded
features from different modalities based on permutation-invariant neural
networks. Our numerical experiments illustrate trade-offs for multi-modal
variational bounds and various aggregation schemes. We show that tighter
variational bounds and more flexible aggregation models can become beneficial
when one wants to approximate the true joint distribution over observed
modalities and latent variables in identifiable models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要