Text-to-Image Generation

282 papers with code • 11 benchmarks • 18 datasets

Text-to-Image Generation is a task in computer vision and natural language processing where the goal is to generate an image that corresponds to a given textual description. This involves converting the text input into a meaningful representation, such as a feature vector, and then using this representation to generate an image that matches the description.

Libraries

Use these libraries to find Text-to-Image Generation models and implementations

DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation

tencent/hunyuandit 13 Mar 2024

However, many of these works face challenges in identifying correct output modalities and generating coherent images accordingly as the number of output modalities increases and the conversations go deeper.

1,565
13 Mar 2024

Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation

shihaozhaozsh/lavi-bridge 12 Mar 2024

In this paper, we explore this objective and propose LaVi-Bridge, a pipeline that enables the integration of diverse pre-trained language models and generative vision models for text-to-image generation.

261
12 Mar 2024

FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

modelscope/facechain 11 Mar 2024

In this paper, motivated by object-oriented programming, we model the subject as a derived class whose base class is its semantic category.

8,414
11 Mar 2024

DivCon: Divide and Conquer for Progressive Text-to-Image Generation

divcon-gen/divcon 11 Mar 2024

To further improve T2I models' capability in numerical and spatial reasoning, the layout is employed as an intermedium to bridge large language models and layout-based diffusion models.

9
11 Mar 2024

MACE: Mass Concept Erasure in Diffusion Models

shilin-lu/mace 10 Mar 2024

In this paper, we introduce MACE, a finetuning framework for the task of mass concept erasure.

248
10 Mar 2024

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

shihaozhaozsh/lavi-bridge 8 Mar 2024

Diffusion models have demonstrated remarkable performance in the domain of text-to-image generation.

261
08 Mar 2024

Face2Diffusion for Fast and Editable Face Personalization

mapooon/face2diffusion 8 Mar 2024

However, it is still challenging for previous methods to preserve both the identity similarity and editability due to overfitting to training samples.

51
08 Mar 2024

NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging

univ-esuty/noisecollage 6 Mar 2024

The current layout-aware text-to-image diffusion models still have several issues, including mismatches between the text and layout conditions and quality degradation of generated images.

33
06 Mar 2024

PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement

ma-labo/promptcharm 6 Mar 2024

However, prompting remains challenging for novice users due to the complexity of the stable diffusion model and the non-trivial efforts required for iteratively editing and refining the text prompts.

9
06 Mar 2024

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

hxixixh/adaflow 5 Mar 2024

Rectified flow is a recent generative model formulation that connects data and noise in a straight line.

5
05 Mar 2024