Story Visualization
20 papers with code • 3 benchmarks • 1 datasets
Story Visualization is the task of generating coherent and aligned sequence of images given a sequence of textual captions representing description of a story. It mainly consists of two tasks: story generation and story continuation, where story continuation uses additional ground truth information in the form of the first frame.
Most implemented papers
Make-A-Story: Visual Memory Conditioned Consistent Story Generation
Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, which are consistent with the story, but also models appropriate correspondences between the characters and the background.
TaleCrafter: Interactive Story Visualization with Multiple Characters
Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, and a reasonable layout of objects in images.
Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models
Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently.
Story Visualization by Online Text Augmentation with Context Memory
Story visualization (SV) is a challenging text-to-image generation task for the difficulty of not only rendering visual details from the text descriptions but also encoding a long-term context across multiple sentences.
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Our quantitative analysis demonstrates that our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods, and these findings are reinforced by a user study.
StoryGPT-V: Large Language Models as Consistent Story Visualizers
Therefore, we introduce \textbf{StoryGPT-V}, which leverages the merits of the latent diffusion (LDM) and LLM to produce images with consistent and high-quality characters grounded on given story descriptions.
Training-Free Consistent Text-to-Image Generation
Text-to-image models offer a new level of creative flexibility by allowing users to guide the image generation process through natural language.
Masked Generative Story Transformer with Character Guidance and Caption Augmentation
Story Visualization (SV) is a challenging generative vision task, that requires both visual quality and consistency between different frames in generated image sequences.