Generating a Temporally Coherent Image Sequence for a Story by Multimodal Recurrent Transformers

ACL ARR November 2021  ·  Anonymous ·

Story visualization is a challenging text-to-image generation task for the difficulty of rendering visual details from abstract text descriptions. Besides the difficulty of image generation, the generator also need to conform to the narrative of a multi-sentence story input. While prior arts in this domain has focused on improving semantic relevance between generated images and input text, controlling the generated images to be temporally consistent still remains as a challenge. To generate a semantically coherent image sequence, we propose an explicit memory controller which can augment the temporal coherence of images in the multi-modal autoregressive transformer, and call Story visualization by MultimodAl Recurrent Transformers or SMART for short. Our method generates high resolution high quality images, outperforming prior works by a significant margin across multiple evaluation metrics on PororoSV dataset.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here