GANs N' Roses: Stable, Controllable, Diverse Image to Image Translation (works for videos too!)

11 Jun 2021  ยท  Min Jin Chong, David Forsyth ยท

We show how to learn a map that takes a content code, derived from a face image, and a randomly chosen style code to an anime image. We derive an adversarial loss from our simple and effective definitions of style and content. This adversarial loss guarantees the map is diverse -- a very wide range of anime can be produced from a single content code. Under plausible assumptions, the map is not just diverse, but also correctly represents the probability of an anime, conditioned on an input face. In contrast, current multimodal generation procedures cannot capture the complex styles that appear in anime. Extensive quantitative experiments support the idea the map is correct. Extensive qualitative results show that the method can generate a much more diverse range of styles than SOTA comparisons. Finally, we show that our formalization of content and style allows us to perform video to video translation without ever training on videos.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image-to-Image Translation cat2dog StarGANv2 DFID 53.6 # 2
FID 44.2 # 2
Image-to-Image Translation cat2dog DRIT++ DFID 160.1 # 3
FID 91.5 # 4
Image-to-Image Translation cat2dog CouncilGAN DFID 172.5 # 4
FID 90.8 # 3
Image-to-Image Translation cat2dog GNR DFID 26.1 # 1
FID 26.9 # 1
Image-to-Image Translation selfie2anime CouncilGAN DFID 56.2 # 2
FID 38.1 # 2
LPIPS 0.43 # 2
Image-to-Image Translation selfie2anime StarGANv2 DFID 83.0 # 3
FID 59.8 # 3
LPIPS 0.427 # 3
Image-to-Image Translation selfie2anime GNR DFID 35.6 # 1
FID 34.4 # 1
LPIPS 0.505 # 1
Image-to-Image Translation selfie2anime DRIT++ DFID 94.6 # 4
FID 63.8 # 4
LPIPS 0.201 # 4

Methods


No methods listed for this paper. Add relevant methods here