A Unified Architecture for Multimodal Image Synthesis
Shichong Peng
APEX Lab
Simon Fraser University
Alireza Moazeni
APEX Lab
Simon Fraser University
Ke Li
APEX Lab
Simon Fraser University
Below we show the advantages of multimodal prediction compared to unimodal prediction.
We propose a modular architecture that captures different levels of detail in a coarse-to-fine manner.
We propose hierarchical sampling, which is a more efficient sampling strategy for Implicit Maximum Likelihood Estimation (IMLE).
We use our method to increase the width and height of input images by a factor of 16x. Toggle for our results (CAM-Net), RFB-ESRGAN and conditional IMLE (cIMLE).
We use our method to colourize a grayscale image. Toggle for our results (CAM-Net) and those of Colorful Image Colorization, Let there be Color, Learning Representations for Automatic Colorization and cIMLE.
We use our method to generate diverse images from scene layouts. Toggle for our results (CAM-Net) and cIMLE.
We use our method to recover plausible images from a badly compressed image. Toggle for our results (CAM-Net), DnCNN and cIMLE.
The marginal distribution captures the variability in one variable, whereas the joint distribution captures variability across multiple variables. The marginal distribution alone does not capture correlations between variables. Below we show a case where modelling just the marginal distributions leads to spurious samples.
The joint distribution is visualized at the centre, whereas the marginal distributions are visualized around the boundary. Red points represent samples from the joint distribution and pink points are sampled from independent marginal distributions. As shown above, pink points may fall outside the probable regions of the joint distribution.
In the case of colourization, the colours of nearby pixels are highly correlated. Zhang et al. proposed a method that models marginal distributions only. Below we compare the different samples from Zhang et al. and CAM-Net which models the joint distribution. As shown, samples from marginal distributions (Zhang et al.) are spatially inconsistent whereas samples from the joint distribution (CAM-Net) are not.
Below we show a conceptual illustration of how IMLE works.