Layout-to-Image Generation
18 papers with code • 7 benchmarks • 4 datasets
Layout-to-image generation its the task to generate a scene based on the given layout. The layout describes the location of the objects to be included in the output image. In this section, you can find state-of-the-art leaderboards for Layout-to-image generation.
Most implemented papers
AttrLostGAN: Attribute Controlled Image Synthesis from Reconfigurable Layout and Style
In this paper, we propose a method for attribute controlled image synthesis from layout which allows to specify the appearance of individual objects without affecting the rest of the image.
Interactive Image Synthesis with Panoptic Layout Generation
In particular, the stuff layouts can take amorphous shapes and fill up the missing regions left out by the instance layouts.
Modeling Image Composition for Complex Scene Generation
Compared to existing CNN-based and Transformer-based generation models that entangled modeling on pixel-level&patch-level and object-level&patch-level respectively, the proposed focal attention predicts the current patch token by only focusing on its highly-related tokens that specified by the spatial layout, thereby achieving disambiguation during training.
Freestyle Layout-to-Image Synthesis
In this work, we explore the freestyle capability of the model, i. e., how far can it generate unseen semantics (e. g., classes, attributes, and styles) onto a given layout, and call the task Freestyle LIS (FLIS).
Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation
In this paper, we propose LayoutBench, a diagnostic benchmark for layout-guided image generation that examines four categories of spatial control skills: number, position, size, and shape.
LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts
Diffusion-based generative models have significantly advanced text-to-image generation but encounter challenges when processing lengthy and intricate text prompts describing complex scenes with multiple objects.
Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive
Current L2I models either suffer from poor editability via text or weak alignment between the generated image and the input layout.
DivCon: Divide and Conquer for Progressive Text-to-Image Generation
To further improve T2I models' capability in numerical and spatial reasoning, the layout is employed as an intermedium to bridge large language models and layout-based diffusion models.