Modular StoryGAN with Background and Theme Awareness for Story Visualization

Story visualization is a novel topic that intersects computer vision and natural language processing. In this task, given a series of natural language sentences that compose a story, a sequence of images should be generated that correspond to the sentences. Prior works have introduced recurrent generative models which outperform text-to-image models on this task; however, local and global consistency is a challenging attribute of these solutions. For the improvement, we proposed a new modular model architecture named Modular StoryGAN containing the best promising components of prior works. To measure the local and global consistency we introduced background and theme awareness, which are expected attributes of the solutions. Based on the human evaluation, the generated images demonstrate that Modular StoryGAN possesses background and theme awareness. Besides the subjective evaluation, the objective one also shows that our model outperforms the state-of-the-art CP-CSV and DuCo models.

PDF

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods