[Paper Review] VAE (Variational AutoEncoder)

4 minute read

Published:

This post is reviewing the VAE paper.

Citations

VAE : Auto-Encoding Variational Bayes - 논문 리뷰

Thumbnail image: towardsdatascience

Introduction

How can we perform efficient approximate inference and learning with directed probabilistic models whose continuous latent variables and/or parameters have intractable posterior distributions?

The answer lies in Variational Bayesian methods, which involve the optimization of approximations to intractable posterior probabilities.

Variational Bayesian (Appendix F)

Marginal Likelihood: The combination of KL divergence and the lower bound.

Marginal Likelihood

Variational Lower Bound

Variational lower bound to the marginal likelihood:

Variational Lower Bound to Marginal Likelihood

Monte Carlo estimate of the variational lower bound:

Monte Carlo Estimate

For more on Variational Bayesian methods.

Stochastic Gradient Variational Bayes (SGVB)

The SGVB estimator is a scalable estimator for variational inference that utilizes stochastic gradients, enabling optimization over large datasets. It facilitates efficient backpropagation through recognition models by approximating gradients, making it useful for efficient approximate posterior inference in almost any model with continuous latent variables and/or parameters.

Auto-Encoding Variational Bayes (AEVB)

The AEVB algorithm makes inference and learning particularly efficient by using the SGVB estimator to optimize a recognition model. This approach allows for very efficient approximate posterior inference using simple ancestral sampling, enabling the efficient learning of model parameters without the need for expensive iterative inference schemes like MCMC per datapoint.

Method

Method

Problem Scenario

Considering the dataset below:

Dataset Scenario

  1. The latent variable zi is generated from the prior distribution p_theta(z).
  2. The dataset xi is generated from the conditional distribution p_theta(x|z).

Dataset Generation

This approach addresses intractability (cannot compute marginal likelihood) and the challenge of large datasets (sampling should be conducted for each data point, which is costly for batch optimization).

The research proposes solutions for three problems:

  1. Efficient approximate ML or MAP estimation for the parameters theta. These parameters can be of interest themselves for analyzing natural processes and generating artificial data.
  2. Efficient approximate posterior inference of the latent variable z given an observed value x for chosen parameters theta. This is useful for coding or data representation tasks.
  3. Efficient approximate marginal inference of the variable x. This allows for various inference tasks where a prior over x is required, such as image denoising, inpainting, and super-resolution in computer vision.

To address these problems, the study introduces a recognition model q_theta(z|x) as an approximation to the intractable true posterior p_theta(z|x).

Method Summary

The recognition model parameters phi are learned together with the generative model parameters theta. Given a data point x, a stochastic encoder produces a distribution (e.g., Gaussian) of possible values for the code z that could generate x. A stochastic decoder p_theta(x|z) then produces a distribution of possible values of x given z.

Method Summary

The Variational Bound

Marginal Likelihood log(p_theta(xi)):

Marginal Likelihood

Right-hand side (RHS):

  1. KL divergence of the approximate from the true posterior (non-negative).
  2. L(theta, phi; xi), the variational lower bound on the marginal likelihood of datapoint i.

RHS

Variational Bound

The objective is to differentiate and optimize the lower bound:

Optimization

This corresponds to calculating the probability of x in Bayes’ theorem, known as the Evidence Lower Bound (ELBO). The loss function is derived from this ELBO.

ELBO

The SGVB Estimator and AEVB Algorithm

SGVB Estimator

Reparameterization Trick

Reparameterization Trick

Two assumptions needed to compute regularization:

  1. The distribution of z that emerges from passing through the encoder, q_phi(z|x), follows a multivariate normal distribution with a diagonal covariance matrix.
  2. The assumed distribution of z, the prior p(z), is that it follows a standard normal distribution with a mean of 0 and a standard deviation of 1.

KLD ensures these assumptions are met and facilitates optimization.

KLD Assumptions

Thus, the approach is differentiable, enabling the calculation of regularization.

Variational Auto-Encoder (VAE)

The variational approximate posterior is a multivariate Gaussian with a diagonal covariance structure.

VAE Gaussian

The log-likelihood log(p_theta(xi | z^(i, l))) is modeled as a Bernoulli or Gaussian MLP, depending on the data type.

Appendix C: MLPs as Probabilistic Encoders and Decoders

Bernoulli MLP:

Bernoulli MLP Bernoulli MLP Details


Gaussian MLP:

![Gaussian MLP](https://github.com/jun-brro/deep-learning-paper-review/assets/115399447/bede1ba3-a3be-

4d91-b61a-d7695a60ae50) Gaussian MLP Details Gaussian MLP Gaussian MLP

Experiments

MNIST & Frey Face Datasets

Experiments

Likelihood Lower Bound

Marginal Likelihood

Marginal Likelihood

Visualization of High-Dimensional Data

Conclusion & Future Work

The SGVB estimator and AEVB algorithm significantly improve variational inference for continuous latent variables, demonstrating theoretical advantages and experimental results.

Future work includes investigating the use of SGVB and AEVB in learning hierarchical generative models, particularly with deep neural networks such as convolutional networks for encoders and decoders. Additionally, applying these methods to dynamic Bayesian networks for modeling time-series data, extending the application of SGVB to optimize global parameters within models, and exploring supervised models that incorporate latent variables to learn complex noise distributions, enhancing model robustness and predictive performance.