Text-to-Image Generation

282 papers with code • 11 benchmarks • 18 datasets

Text-to-Image Generation is a task in computer vision and natural language processing where the goal is to generate an image that corresponds to a given textual description. This involves converting the text input into a meaningful representation, such as a feature vector, and then using this representation to generate an image that matches the description.

Benchmarks

Add a Result

These leaderboards are used to track progress in Text-to-Image Generation

Dataset	Best Model	Compare
MS COCO	Parti Finetuned	See all
CUB	TLDM	See all
Multi-Modal-CelebA-HQ	Swinv2-Imagen	See all
Oxford 102 Flowers	VQ-Diffusion-F	See all
Conceptual Captions	Contextual RQ-Transformer	See all
LHQC	NUWA-Infinity	See all
MS-COCO	AttnGAN	See all
GeNeVA (CoDraw)	LatteGAN	See all
GeNeVA (i-CLEVR)	LatteGAN	See all
LAION COCO	Parti Finetuned	See all
Colors	BiLSTMS on color generation	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Text-to-Image Generation models and implementations

faceonlive/ai-research

4 papers

240

hanzhanggit/StackGAN

3 papers

1,852

kakaobrain/rq-vae-transformer

3 papers

705

hanzhanggit/StackGAN-Pytorch

3 papers

483

See all 18 libraries.

Datasets

Subtasks

Concept Alignment

Conditional Text-to-Image Synthesis

Consistent Character Generation

DreamBooth Personalized Generation

Latest papers

Most implemented Social Latest No code

DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation

tencent/hunyuandit • • 13 Mar 2024

However, many of these works face challenges in identifying correct output modalities and generating coherent images accordingly as the number of output modalities increases and the conversations go deeper.

1,565

13 Mar 2024

Paper
Code

Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation

shihaozhaozsh/lavi-bridge • • 12 Mar 2024

In this paper, we explore this objective and propose LaVi-Bridge, a pipeline that enables the integration of diverse pre-trained language models and generative vision models for text-to-image generation.

261

12 Mar 2024

Paper
Code

FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

modelscope/facechain • • 11 Mar 2024

In this paper, motivated by object-oriented programming, we model the subject as a derived class whose base class is its semantic category.

8,414

11 Mar 2024

Paper
Code

DivCon: Divide and Conquer for Progressive Text-to-Image Generation

divcon-gen/divcon • • 11 Mar 2024

To further improve T2I models' capability in numerical and spatial reasoning, the layout is employed as an intermedium to bridge large language models and layout-based diffusion models.

11 Mar 2024

Paper
Code

MACE: Mass Concept Erasure in Diffusion Models

shilin-lu/mace • • 10 Mar 2024

In this paper, we introduce MACE, a finetuning framework for the task of mass concept erasure.

248

10 Mar 2024

Paper
Code

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

shihaozhaozsh/lavi-bridge • • 8 Mar 2024

Diffusion models have demonstrated remarkable performance in the domain of text-to-image generation.

261

08 Mar 2024

Paper
Code

Face2Diffusion for Fast and Editable Face Personalization

mapooon/face2diffusion • • 8 Mar 2024

However, it is still challenging for previous methods to preserve both the identity similarity and editability due to overfitting to training samples.

08 Mar 2024

Paper
Code

NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging

univ-esuty/noisecollage • • 6 Mar 2024

The current layout-aware text-to-image diffusion models still have several issues, including mismatches between the text and layout conditions and quality degradation of generated images.

06 Mar 2024

Paper
Code

PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement

ma-labo/promptcharm • • 6 Mar 2024

However, prompting remains challenging for novice users due to the complexity of the stable diffusion model and the non-trivial efforts required for iteratively editing and refining the text prompts.

06 Mar 2024

Paper
Code

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

hxixixh/adaflow • • 5 Mar 2024

Rectified flow is a recent generative model formulation that connects data and noise in a straight line.

05 Mar 2024

Paper
Code

Text-to-Image Generation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result