Text-to-Image Generation

275 papers with code • 11 benchmarks • 18 datasets

Text-to-Image Generation is a task in computer vision and natural language processing where the goal is to generate an image that corresponds to a given textual description. This involves converting the text input into a meaningful representation, such as a feature vector, and then using this representation to generate an image that matches the description.

Libraries

Use these libraries to find Text-to-Image Generation models and implementations

Latest papers with no code

MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models

no code yet • 15 Apr 2024

Large diffusion-based Text-to-Image (T2I) models have shown impressive generative powers for text-to-image generation as well as spatially conditioned image generation.

DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling

no code yet • 14 Apr 2024

Recent progress in text-to-3D creation has been propelled by integrating the potent prior of Diffusion Models from text-to-image generation into the 3D domain.

Mask-ControlNet: Higher-Quality Image Generation with An Additional Mask Prompt

no code yet • 8 Apr 2024

Then, the object images are employed as additional prompts to facilitate the diffusion model to better understand the relationship between foreground and background regions during image generation.

Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models

no code yet • 5 Apr 2024

While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging.

Diverse and Tailored Image Generation for Zero-shot Multi-label Classification

no code yet • 4 Apr 2024

Our approach introduces a novel image generation framework that produces multi-label synthetic images of unseen classes for classifier training.

On the Scalability of Diffusion-based Text-to-Image Generation

no code yet • 3 Apr 2024

On the data scaling side, we show the quality and diversity of the training set matters more than simply dataset size.

MatAtlas: Text-driven Consistent Geometry Texturing and Material Assignment

no code yet • 3 Apr 2024

We present MatAtlas, a method for consistent text-guided 3D model texturing.

MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation

no code yet • 3 Apr 2024

To build MuLAn, we developed a training free pipeline which decomposes a monocular RGB image into a stack of RGBA layers comprising of background and isolated instances.

Survey of Bias In Text-to-Image Generation: Definition, Evaluation, and Mitigation

no code yet • 1 Apr 2024

In this survey, we review prior studies on dimensions of bias: Gender, Skintone, and Geo-Culture.

Condition-Aware Neural Network for Controlled Image Generation

no code yet • 1 Apr 2024

In parallel to prior conditional control methods, CAN controls the image generation process by dynamically manipulating the weight of the neural network.