Semantic Image Synthesis (SIS) is a subclass of image-to-image translation where a semantic layout is used to generate a photorealistic image. State-of-the-art conditional Generative Adversarial Networks (GANs) need a huge amount of paired data to accomplish this task while generic unpaired image-to-image translation frameworks underperform in comparison, because they color-code semantic layouts and learn correspondences in appearance instead of semantic content. Starting from the assumption that a high quality generated image should be segmented back to its semantic layout, we propose a new Unsupervised paradigm for SIS (USIS) that makes use of a self-supervised segmentation loss and whole image wavelet based discrimination. Furthermore, in order to match the high-frequency distribution of real images, a novel generator architecture in the wavelet domain is proposed. We test our methodology on 3 challenging datasets and demonstrate its ability to bridge the performance gap between paired and unpaired models.

PDF Abstract IEEE International 2022 PDF IEEE International 2022 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image-to-Image Translation ADE20K Labels-to-Photos USIS-Wavelet mIoU 16.95 # 11
FID 34.5 # 10
Image-to-Image Translation Cityscapes Labels-to-Photo USIS-Wavelet mIoU 42.32 # 14
FID 50.14 # 5
Image-to-Image Translation COCO-Stuff Labels-to-Photos USIS-Wavelet mIoU 13.4 # 9
FID 28.6 # 11

Methods