Text-to-Image Generation

276 papers with code • 11 benchmarks • 18 datasets

Text-to-Image Generation is a task in computer vision and natural language processing where the goal is to generate an image that corresponds to a given textual description. This involves converting the text input into a meaningful representation, such as a feature vector, and then using this representation to generate an image that matches the description.

Libraries

Use these libraries to find Text-to-Image Generation models and implementations

SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

IDKiro/sdxs 25 Mar 2024

Recent advancements in diffusion models have positioned them at the forefront of image generation.

487
25 Mar 2024

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

Owen-Oertell/rlcm 25 Mar 2024

To overcome this limitation, consistency models proposed learning a new class of generative models that directly map noise to data, resulting in a model that can generate an image in as few as one sampling iteration.

33
25 Mar 2024

Long-CLIP: Unlocking the Long-Text Capability of CLIP

beichenzbc/long-clip 22 Mar 2024

Contrastive Language-Image Pre-training (CLIP) has been the cornerstone for zero-shot classification, text-image retrieval, and text-image generation by aligning image and text modalities.

315
22 Mar 2024

CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusion model

infiniq-ai1/clipvqdiffusion 22 Mar 2024

There has been a significant progress in text conditional image generation models.

3
22 Mar 2024

Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models

vpulab/ovam 21 Mar 2024

This approach limits the generation of segmentation masks derived from word tokens not contained in the text prompt.

28
21 Mar 2024

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

leonhlj/fouriscale 19 Mar 2024

In this study, we delve into the generation of high-resolution images from pre-trained diffusion models, addressing persistent challenges, such as repetitive patterns and structural distortions, that emerge when models are applied beyond their trained resolutions.

95
19 Mar 2024

OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models

kongzhecn/omg 16 Mar 2024

We also observe that the initiation denoising timestep for noise blending is the key to identity preservation and layout.

519
16 Mar 2024

StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

ironjr/streammultidiffusion 14 Mar 2024

The enormous success of diffusion models in text-to-image synthesis has made them promising candidates for the next generation of end-user applications for image generation and editing.

392
14 Mar 2024

Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation

shihaozhaozsh/lavi-bridge 12 Mar 2024

In this paper, we explore this objective and propose LaVi-Bridge, a pipeline that enables the integration of diverse pre-trained language models and generative vision models for text-to-image generation.

254
12 Mar 2024

FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

modelscope/facechain 11 Mar 2024

In this paper, motivated by object-oriented programming, we model the subject as a derived class whose base class is its semantic category.

8,349
11 Mar 2024