Deep Generative Image Models using a Laplacian Pyramid of Adversarial Netowrk

1 Introduction

This work uses  a cascade of convolution network (Convet) under the laplacian pyramid framework to generate images in a coarse-to-fine fashion. At each level of the pyramid,  an independent  Convnet is trained using Generative Adversarial Network, which can capture image structures at a particular scale.  Instead of generating an image at one time, they do it in a sequence style. At the beginning, a random vector is sampled as input to a Convet, output an image at its level. The second stage samples the structure of image at the current  level, conditioned on the image generated before. Subsequent levels continue this process , always conditioning on the output from the previous scale until the final level is reached. 

 

2 Approach

2.1 Generative Adversarial Network

The method contains two networks against one another: a generative model G that captures the data distribution and a discriminative model D that distinguishes between samples drawn from G and samples of the image data. G takes a noise vector drawn from a distribution Pnoise(Z) and output an image h'. D takes an image as input stochastically chose to be either h' , as generated by G , or h, a real image drawn from the training data. Pdata(h). D outputs a scalar probability ,which is trained to be high if the input was real and low if generated from G. 

f:id:PDFangeltop1:20160423164029p:plain

 

2.2 Laplacian pyramid 

d(.) be a downsampling operation.

Image I with shape of (W,H), d(I) with shape of (W/2,H/2)

u(.) be a upsampling operation.

Image I with shape of (W/2,H/2), u(I) with shape of (W,H)

The coefficient hk at each level k of the laplacian pyramid are constructed  by taking the difference between adjacent levels in the Gaussian pyramid. 

f:id:PDFangeltop1:20160423164854p:plain

 

2.3 Laplacian Generative Adversarial Network

We have a set of generative Convet models {G0,G1,......Gk}, each of which captures the distribution of coefficient hk for natural images at a different level of the laplacian pyramid.

f:id:PDFangeltop1:20160423170754p:plain

Note that models at all levels except the final are conditional generative models that take an upsampled  version of the current image I'k+1 as a conditioning variable, in addition to the noise vector Zk. 

f:id:PDFangeltop1:20160423162845p:plain

 

f:id:PDFangeltop1:20160423171327p:plain

The generative models {G0,G1,......., Gk} are trained using the Conditional GAN approach at each level of the pyramid. We construct a laplacian pyramid from each training image I. At  each level we make a stochastic decision to either (i) construct the coefficients hk using the Equation(3), or generate them  using Gk.

f:id:PDFangeltop1:20160423172101p:plain

Gk takes as input the image Lk = u(Lk+1), as well as noise vector Zk. Dk takes as input hk or h'k along with the low-pass image Lk, and predicts  if the image was real or generated.