Text-to-Image Generation
282 papers with code • 11 benchmarks • 18 datasets
Text-to-Image Generation is a task in computer vision and natural language processing where the goal is to generate an image that corresponds to a given textual description. This involves converting the text input into a meaningful representation, such as a feature vector, and then using this representation to generate an image that matches the description.
Libraries
Use these libraries to find Text-to-Image Generation models and implementationsDatasets
Subtasks
Latest papers
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
However, many of these works face challenges in identifying correct output modalities and generating coherent images accordingly as the number of output modalities increases and the conversations go deeper.
Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
In this paper, we explore this objective and propose LaVi-Bridge, a pipeline that enables the integration of diverse pre-trained language models and generative vision models for text-to-image generation.
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
In this paper, motivated by object-oriented programming, we model the subject as a derived class whose base class is its semantic category.
DivCon: Divide and Conquer for Progressive Text-to-Image Generation
To further improve T2I models' capability in numerical and spatial reasoning, the layout is employed as an intermedium to bridge large language models and layout-based diffusion models.
MACE: Mass Concept Erasure in Diffusion Models
In this paper, we introduce MACE, a finetuning framework for the task of mass concept erasure.
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Diffusion models have demonstrated remarkable performance in the domain of text-to-image generation.
Face2Diffusion for Fast and Editable Face Personalization
However, it is still challenging for previous methods to preserve both the identity similarity and editability due to overfitting to training samples.
NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging
The current layout-aware text-to-image diffusion models still have several issues, including mismatches between the text and layout conditions and quality degradation of generated images.
PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement
However, prompting remains challenging for novice users due to the complexity of the stable diffusion model and the non-trivial efforts required for iteratively editing and refining the text prompts.
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Rectified flow is a recent generative model formulation that connects data and noise in a straight line.