Story Visualization

20 papers with code • 3 benchmarks • 1 datasets

Story Visualization is the task of generating coherent and aligned sequence of images given a sequence of textual captions representing description of a story. It mainly consists of two tasks: story generation and story continuation, where story continuation uses additional ground truth information in the form of the first frame.

Datasets


Most implemented papers

Make-A-Story: Visual Memory Conditioned Consistent Story Generation

ubc-vision/make-a-story CVPR 2023

Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, which are consistent with the story, but also models appropriate correspondences between the characters and the background.

TaleCrafter: Interactive Story Visualization with Multiple Characters

videocrafter/talecrafter 29 May 2023

Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, and a reasonable layout of objects in images.

Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models

haoningwu3639/StoryGen 1 Jun 2023

Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently.

Story Visualization by Online Text Augmentation with Context Memory

yonseivnl/cmota ICCV 2023

Story visualization (SV) is a challenging text-to-image generation task for the difficulty of not only rendering visual details from the text descriptions but also encoding a long-term context across multiple sentences.

The Chosen One: Consistent Characters in Text-to-Image Diffusion Models

ZichengDuan/TheChosenOne 16 Nov 2023

Our quantitative analysis demonstrates that our method strikes a better balance between prompt alignment and identity consistency compared to the baseline methods, and these findings are reinforced by a user study.

StoryGPT-V: Large Language Models as Consistent Story Visualizers

xiaoqian-shen/StoryGPT-V 4 Dec 2023

Therefore, we introduce \textbf{StoryGPT-V}, which leverages the merits of the latent diffusion (LDM) and LLM to produce images with consistent and high-quality characters grounded on given story descriptions.

Training-Free Consistent Text-to-Image Generation

kousw/experimental-consistory 5 Feb 2024

Text-to-image models offer a new level of creative flexibility by allowing users to guide the image generation process through natural language.

Masked Generative Story Transformer with Character Guidance and Caption Augmentation

chrispapa2000/maskgst 13 Mar 2024

Story Visualization (SV) is a challenging generative vision task, that requires both visual quality and consistency between different frames in generated image sequences.