Story Visualization
20 papers with code • 3 benchmarks • 1 datasets
Story Visualization is the task of generating coherent and aligned sequence of images given a sequence of textual captions representing description of a story. It mainly consists of two tasks: story generation and story continuation, where story continuation uses additional ground truth information in the form of the first frame.
Latest papers
StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion
3) The story visualization and continuation models are trained and inferred independently, which is not user-friendly.
Masked Generative Story Transformer with Character Guidance and Caption Augmentation
Story Visualization (SV) is a challenging generative vision task, that requires both visual quality and consistency between different frames in generated image sequences.
Training-Free Consistent Text-to-Image Generation
Text-to-image models offer a new level of creative flexibility by allowing users to guide the image generation process through natural language.
StoryGPT-V: Large Language Models as Consistent Story Visualizers
Therefore, we introduce \textbf{StoryGPT-V}, which leverages the merits of the latent diffusion (LDM) and LLM to produce images with consistent and high-quality characters grounded on given story descriptions.
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Our quantitative analysis demonstrates that our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods, and these findings are reinforced by a user study.
Story Visualization by Online Text Augmentation with Context Memory
Story visualization (SV) is a challenging text-to-image generation task for the difficulty of not only rendering visual details from the text descriptions but also encoding a long-term context across multiple sentences.
Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models
Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently.
TaleCrafter: Interactive Story Visualization with Multiple Characters
Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, and a reasonable layout of objects in images.
Make-A-Story: Visual Memory Conditioned Consistent Story Generation
Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, which are consistent with the story, but also models appropriate correspondences between the characters and the background.
Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
Conditioned diffusion models have demonstrated state-of-the-art text-to-image synthesis capacity.