Text-to-Image Generation
275 papers with code • 11 benchmarks • 18 datasets
Text-to-Image Generation is a task in computer vision and natural language processing where the goal is to generate an image that corresponds to a given textual description. This involves converting the text input into a meaningful representation, such as a feature vector, and then using this representation to generate an image that matches the description.
Libraries
Use these libraries to find Text-to-Image Generation models and implementationsDatasets
Subtasks
Latest papers with no code
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models
Large diffusion-based Text-to-Image (T2I) models have shown impressive generative powers for text-to-image generation as well as spatially conditioned image generation.
DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling
Recent progress in text-to-3D creation has been propelled by integrating the potent prior of Diffusion Models from text-to-image generation into the 3D domain.
Mask-ControlNet: Higher-Quality Image Generation with An Additional Mask Prompt
Then, the object images are employed as additional prompts to facilitate the diffusion model to better understand the relationship between foreground and background regions during image generation.
Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models
While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging.
Diverse and Tailored Image Generation for Zero-shot Multi-label Classification
Our approach introduces a novel image generation framework that produces multi-label synthetic images of unseen classes for classifier training.
On the Scalability of Diffusion-based Text-to-Image Generation
On the data scaling side, we show the quality and diversity of the training set matters more than simply dataset size.
MatAtlas: Text-driven Consistent Geometry Texturing and Material Assignment
We present MatAtlas, a method for consistent text-guided 3D model texturing.
MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
To build MuLAn, we developed a training free pipeline which decomposes a monocular RGB image into a stack of RGBA layers comprising of background and isolated instances.
Survey of Bias In Text-to-Image Generation: Definition, Evaluation, and Mitigation
In this survey, we review prior studies on dimensions of bias: Gender, Skintone, and Geo-Culture.
Condition-Aware Neural Network for Controlled Image Generation
In parallel to prior conditional control methods, CAN controls the image generation process by dynamically manipulating the weight of the neural network.