A General-purpose Framework for Multimodal Conditional Image Synthesis
Shichong Peng
APEX Lab
Simon Fraser University
Alireza Moazeni
APEX Lab
Simon Fraser University
Ke Li
APEX Lab
Simon Fraser University,
Google
We use our method to increase the width and height of input images by a factor of 16x. Toggle for results from our method (CHIMLE) and those of leading baselines, including GAN-based general-purpose methods (BicycleGAN, MSGAN, DivCo and MoNCE) , diffusion-based general-purpose methods (DDRM and NDM), a task specific method (RFB-ESRGAN) and conditional IMLE (cIMLE).
We use our method to colourize a grayscale image. Toggle for results from our method (CHIMLE) and those of leading baselines, including GAN-based general-purpose methods (BicycleGAN, MSGAN, DivCo and MoNCE) , diffusion-based general-purpose methods (DDRM and NDM), a task specific method (InstColorization) and conditional IMLE (cIMLE).
We use our method to recover plausible images from a badly compressed image. Toggle for results from our method (CHIMLE) and those of leading baselines, including GAN-based general-purpose methods (BicycleGAN, MSGAN, DivCo and MoNCE) , diffusion-based general-purpose methods (DDRM and NDM), a task specific method (DnCNN) and conditional IMLE (cIMLE).
We use our method to convert a scene at nighttime to daytime. Toggle for results from our method (CHIMLE) and those of leading baselines, including GAN-based general-purpose methods (BicycleGAN, MSGAN, DivCo and MoNCE) , a diffusion-based general-purpose method (NDM) and conditional IMLE (cIMLE).
Implicit Maximum Likelihood Estimation (IMLE) trains a generator without a discriminator. Below is a conceptual illustration of how IMLE works.
Compared to GANs, IMLE has two key differences:
it avoids mode collapse and training instability.
The animation below shows what happens when training a GAN.
As shown above, a GAN encourages every generated sample to be similar to some real data points.
On the other hand, IMLE flips the direction: it instead ensures every real data point has some similar generated samples.
Below is a comparison of the behaviours of GAN and IMLE over the course of training. Real data points are shown as blue crosses and the probability density of generated samples is shown as a heatmap.
GAN
IMLE
As shown above, the GAN usually generates data points at the bottom and largely ignores the data points at the top. In comparison, IMLE can generate all data points with similar frequency.
Because IMLE uses a non-adversarial objective, it trains stably.
The output is shown on the left and the loss over time is shown on the right. The output quality improves steadily over the course of training.
@inproceedings{peng2022chimle,
title={CHIMLE: Conditional Hierarchical IMLE for Multimodal Conditional Image Synthesis},
author={Shichong Peng and Alireza Moazeni and Ke Li},
booktitle={Advances in Neural Information Processing Systems},
year={2022}
}