TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Face Sketch Synthesis	Multi-Modal CelebA-HQ	Diffusion	FID	26.09	# 1
multimodal generation	Multi-Modal CelebA-HQ	Diffusion	FID	26.09	# 1
Text-to-Image Generation	Multi-Modal-CelebA-HQ	Unite and Conquer	FID	26.09	# 4
Text-to-Image Generation	Multi-Modal-CelebA-HQ	Unite and Conquer	LPIPS	0.519	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unite-and-conquer-cross-dataset-multimodal/face-sketch-synthesis-on-multi-modal-celeba)](https://paperswithcode.com/sota/face-sketch-synthesis-on-multi-modal-celeba?p=unite-and-conquer-cross-dataset-multimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unite-and-conquer-cross-dataset-multimodal/multimodal-generation-on-multi-modal-celeba)](https://paperswithcode.com/sota/multimodal-generation-on-multi-modal-celeba?p=unite-and-conquer-cross-dataset-multimodal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unite-and-conquer-cross-dataset-multimodal/text-to-image-generation-on-multi-modal)](https://paperswithcode.com/sota/text-to-image-generation-on-multi-modal?p=unite-and-conquer-cross-dataset-multimodal)`

Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models

CVPR 2023 · Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara, Vishal M. Patel ·

Generating photos satisfying multiple constraints find broad utility in the content creation industry. A key hurdle to accomplishing this task is the need for paired data consisting of all modalities (i.e., constraints) and their corresponding output. Moreover, existing methods need retraining using paired data across all modalities to introduce a new condition. This paper proposes a solution to this problem based on denoising diffusion probabilistic models (DDPMs). Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models. Since each sampling step in the DDPM follows a Gaussian distribution, we show that there exists a closed-form solution for generating an image given various constraints. Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task through our proposed sampling strategy. We also introduce a novel reliability parameter that allows using different off-the-shelf diffusion models trained across various datasets during sampling time alone to guide it to the desired outcome satisfying multiple constraints. We perform experiments on various standard multimodal tasks to demonstrate the effectiveness of our approach. More details can be found in https://nithin-gk.github.io/projectpages/Multidiff/index.html

PDF Abstract CVPR 2023 PDF CVPR 2023 Abstract

Code

Add Remove Mark official

Nithin-GK/UniteandConquer official

Tasks

Add Remove

Face Generation

Face Sketch Synthesis

Image Generation

multimodal generation

Semantic Segmentation

Text-to-Face Generation

Text-to-Image Generation

Datasets

FFHQ

CelebA-HQ

Multi-Modal CelebA-HQ

Results from the Paper

Edit

Ranked #1 on Face Sketch Synthesis on Multi-Modal CelebA-HQ

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Face Sketch Synthesis	Multi-Modal CelebA-HQ	Diffusion	FID	26.09	# 1	Compare
multimodal generation	Multi-Modal CelebA-HQ	Diffusion	FID	26.09	# 1	Compare
Text-to-Image Generation	Multi-Modal-CelebA-HQ	Unite and Conquer	FID	26.09	# 4	Compare
Text-to-Image Generation	Multi-Modal-CelebA-HQ	Unite and Conquer	LPIPS	0.519	# 4	Compare

Methods

Add Remove

Diffusion

Edit Social Preview

Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove