CHIMLE

A General-purpose Framework for Multimodal Conditional Image Synthesis

Shichong Peng

APEX Lab
Simon Fraser University

Alireza Moazeni

APEX Lab
Simon Fraser University

Ke Li

APEX Lab
Simon Fraser University,
Google

Links

Paper

Paper

Github

Code


16x Super-Resolution

We use our method to increase the width and height of input images by a factor of 16x. Toggle for results from our method (CHIMLE) and those of leading baselines, including GAN-based general-purpose methods (BicycleGAN, MSGAN, DivCo and MoNCE) , diffusion-based general-purpose methods (DDRM and NDM), a task specific method (RFB-ESRGAN) and conditional IMLE (cIMLE).


Image Colourization

We use our method to colourize a grayscale image. Toggle for results from our method (CHIMLE) and those of leading baselines, including GAN-based general-purpose methods (BicycleGAN, MSGAN, DivCo and MoNCE) , diffusion-based general-purpose methods (DDRM and NDM), a task specific method (InstColorization) and conditional IMLE (cIMLE).


Image Decompression

We use our method to recover plausible images from a badly compressed image. Toggle for results from our method (CHIMLE) and those of leading baselines, including GAN-based general-purpose methods (BicycleGAN, MSGAN, DivCo and MoNCE) , diffusion-based general-purpose methods (DDRM and NDM), a task specific method (DnCNN) and conditional IMLE (cIMLE).


Night-to-Day

We use our method to convert a scene at nighttime to daytime. Toggle for results from our method (CHIMLE) and those of leading baselines, including GAN-based general-purpose methods (BicycleGAN, MSGAN, DivCo and MoNCE) , a diffusion-based general-purpose method (NDM) and conditional IMLE (cIMLE).


IMLE vs GAN

Implicit Maximum Likelihood Estimation (IMLE) trains a generator without a discriminator. Below is a conceptual illustration of how IMLE works.


Compared to GANs, IMLE has two key differences: it avoids mode collapse and training instability. The animation below shows what happens when training a GAN.


As shown above, a GAN encourages every generated sample to be similar to some real data points. On the other hand, IMLE flips the direction: it instead ensures every real data point has some similar generated samples.

Mode Collapse

Below is a comparison of the behaviours of GAN and IMLE over the course of training. Real data points are shown as blue crosses and the probability density of generated samples is shown as a heatmap.

GAN

IMLE

As shown above, the GAN usually generates data points at the bottom and largely ignores the data points at the top. In comparison, IMLE can generate all data points with similar frequency.

Stable Training

Because IMLE uses a non-adversarial objective, it trains stably.

The output is shown on the left and the loss over time is shown on the right. The output quality improves steadily over the course of training.


Citation

@inproceedings{peng2022chimle,
   title={CHIMLE: Conditional Hierarchical IMLE for Multimodal Conditional Image Synthesis},
   author={Shichong Peng and Alireza Moazeni and Ke Li},
   booktitle={Advances in Neural Information Processing Systems},
   year={2022}
}