Text-to-Image Generation

276 papers with code • 11 benchmarks • 18 datasets

Text-to-Image Generation is a task in computer vision and natural language processing where the goal is to generate an image that corresponds to a given textual description. This involves converting the text input into a meaningful representation, such as a feature vector, and then using this representation to generate an image that matches the description.

Benchmarks

Add a Result

These leaderboards are used to track progress in Text-to-Image Generation

Dataset	Best Model	Compare
MS COCO	Parti Finetuned	See all
CUB	TLDM	See all
Multi-Modal-CelebA-HQ	Swinv2-Imagen	See all
Oxford 102 Flowers	VQ-Diffusion-F	See all
Conceptual Captions	Contextual RQ-Transformer	See all
LHQC	NUWA-Infinity	See all
MS-COCO	AttnGAN	See all
GeNeVA (CoDraw)	LatteGAN	See all
GeNeVA (i-CLEVR)	LatteGAN	See all
LAION COCO	Parti Finetuned	See all
Colors	BiLSTMS on color generation	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Text-to-Image Generation models and implementations

faceonlive/ai-research

4 papers

135

hanzhanggit/StackGAN

3 papers

1,850

kakaobrain/rq-vae-transformer

3 papers

683

hanzhanggit/StackGAN-Pytorch

3 papers

480

See all 18 libraries.

Datasets

Subtasks

Concept Alignment

Conditional Text-to-Image Synthesis

Consistent Character Generation

DreamBooth Personalized Generation

Most implemented papers

Most implemented Social Latest No code

Show and Tell: A Neural Image Caption Generator

karpathy/neuraltalk • • CVPR 2015

Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions.

Paper
Code

Generative Adversarial Text to Image Synthesis

reedscot/icml2016 • • 17 May 2016

Automatic synthesis of realistic images from text would be interesting and useful, but current AI systems are still far from this goal.

Paper
Code

High-Resolution Image Synthesis with Latent Diffusion Models

compvis/stable-diffusion • • CVPR 2022

By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond.

Paper
Code

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

hanzhanggit/StackGAN • • ICCV 2017

Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications.

Paper
Code

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

taoxugit/AttnGAN • • CVPR 2018

In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation.

Paper
Code

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

hanzhanggit/StackGAN • • 19 Oct 2017

In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) aiming at generating high-resolution photo-realistic images.

Paper
Code

Taming Transformers for High-Resolution Image Synthesis

CompVis/taming-transformers • • CVPR 2021

We demonstrate how combining the effectiveness of the inductive bias of CNNs with the expressivity of transformers enables them to model and thereby synthesize high-resolution images.

Paper
Code

Zero-Shot Text-to-Image Generation

openai/DALL-E • • 24 Feb 2021

Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset.

Paper
Code

Hierarchical Text-Conditional Image Generation with CLIP Latents

lucidrains/DALLE2-pytorch • • 13 Apr 2022

Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style.

Paper
Code

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

rinongal/textual_inversion • • 2 Aug 2022

Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

Paper
Code

Text-to-Image Generation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result