Modular StoryGAN with Background and Theme Awareness for Story Visualization

ICPRAI 2022 (3rd International Conference on Pattern Recognition and Artificial Intelligence) 2022 · Gábor Szűcs, Modafar Al-Shouha ·

Story visualization is a novel topic that intersects computer vision and natural language processing. In this task, given a series of natural language sentences that compose a story, a sequence of images should be generated that correspond to the sentences. Prior works have introduced recurrent generative models which outperform text-to-image models on this task; however, local and global consistency is a challenging attribute of these solutions. For the improvement, we proposed a new modular model architecture named Modular StoryGAN containing the best promising components of prior works. To measure the local and global consistency we introduced background and theme awareness, which are expected attributes of the solutions. Based on the human evaluation, the generated images demonstrate that Modular StoryGAN possesses background and theme awareness. Besides the subjective evaluation, the objective one also shows that our model outperforms the state-of-the-art CP-CSV and DuCo models.

PDF