Text-to-Image Generation

276 papers with code • 11 benchmarks • 18 datasets

Text-to-Image Generation is a task in computer vision and natural language processing where the goal is to generate an image that corresponds to a given textual description. This involves converting the text input into a meaningful representation, such as a feature vector, and then using this representation to generate an image that matches the description.

Benchmarks

Add a Result

These leaderboards are used to track progress in Text-to-Image Generation

Dataset	Best Model	Compare
MS COCO	Parti Finetuned	See all
CUB	TLDM	See all
Multi-Modal-CelebA-HQ	Swinv2-Imagen	See all
Oxford 102 Flowers	VQ-Diffusion-F	See all
Conceptual Captions	Contextual RQ-Transformer	See all
LHQC	NUWA-Infinity	See all
MS-COCO	AttnGAN	See all
GeNeVA (CoDraw)	LatteGAN	See all
GeNeVA (i-CLEVR)	LatteGAN	See all
LAION COCO	Parti Finetuned	See all
Colors	BiLSTMS on color generation	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Text-to-Image Generation models and implementations

faceonlive/ai-research

4 papers

174

hanzhanggit/StackGAN

3 papers

1,849

kakaobrain/rq-vae-transformer

3 papers

690

hanzhanggit/StackGAN-Pytorch

3 papers

480

See all 18 libraries.

Datasets

Subtasks

Concept Alignment

Conditional Text-to-Image Synthesis

Consistent Character Generation

DreamBooth Personalized Generation

Latest papers

Most implemented Social Latest No code

SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

IDKiro/sdxs • • 25 Mar 2024

Recent advancements in diffusion models have positioned them at the forefront of image generation.

487

25 Mar 2024

Paper
Code

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

Owen-Oertell/rlcm • • 25 Mar 2024

To overcome this limitation, consistency models proposed learning a new class of generative models that directly map noise to data, resulting in a model that can generate an image in as few as one sampling iteration.

25 Mar 2024

Paper
Code

Long-CLIP: Unlocking the Long-Text Capability of CLIP

beichenzbc/long-clip • • 22 Mar 2024

Contrastive Language-Image Pre-training (CLIP) has been the cornerstone for zero-shot classification, text-image retrieval, and text-image generation by aligning image and text modalities.

315

22 Mar 2024

Paper
Code

CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusion model

infiniq-ai1/clipvqdiffusion • • 22 Mar 2024

There has been a significant progress in text conditional image generation models.

22 Mar 2024

Paper
Code

Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models

vpulab/ovam • • 21 Mar 2024

This approach limits the generation of segmentation masks derived from word tokens not contained in the text prompt.

21 Mar 2024

Paper
Code

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

leonhlj/fouriscale • • 19 Mar 2024

In this study, we delve into the generation of high-resolution images from pre-trained diffusion models, addressing persistent challenges, such as repetitive patterns and structural distortions, that emerge when models are applied beyond their trained resolutions.

19 Mar 2024

Paper
Code

OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models

kongzhecn/omg • • 16 Mar 2024

We also observe that the initiation denoising timestep for noise blending is the key to identity preservation and layout.

519

16 Mar 2024

Paper
Code

StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

ironjr/streammultidiffusion • • 14 Mar 2024

The enormous success of diffusion models in text-to-image synthesis has made them promising candidates for the next generation of end-user applications for image generation and editing.

392

14 Mar 2024

Paper
Code

Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation

shihaozhaozsh/lavi-bridge • • 12 Mar 2024

In this paper, we explore this objective and propose LaVi-Bridge, a pipeline that enables the integration of diverse pre-trained language models and generative vision models for text-to-image generation.

254

12 Mar 2024

Paper
Code

FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

modelscope/facechain • • 11 Mar 2024

In this paper, motivated by object-oriented programming, we model the subject as a derived class whose base class is its semantic category.

8,349

11 Mar 2024

Paper
Code

Text-to-Image Generation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result