Hugging Face Datasets (New!) | Website | Github Repository | arXiv e-Print

The Visual Writing Prompts (VWP) dataset contains almost 2K selected sequences of movie shots, each including 5-10 images. The image sequences are aligned with a total of 12K stories which are collected via crowdsourcing given the image sequences and up to 5 grounded characters from the corresponding image sequence.

Dataset Details

Links

Dataset Description

The Visual Writing Prompts (VWP) dataset is designed to facilitate the development and testing of natural language processing models that generate stories based on sequences of images. This dataset comprises nearly 2,000 curated sequences of movie shots, each sequence containing between 5 to 10 images. These images are meticulously selected to ensure they depict coherent plots centered around one or more main characters, enhancing the visual narrative structure for story generation. Aligned with these image sequences are approximately 12,000 stories, which were written by crowd workers using Amazon Mechanical Turk. This setup aims to provide a rich, visually grounded storytelling context that helps models generate more coherent, diverse, and engaging stories.

  • Curated by: Xudong Hong, Asad Sayeed, Khushboo Mehra, Vera Demberg, Bernt Schiele
  • Funded by: See Acknowledgments in our paper
  • Language(s) (NLP): English
  • License: Apache License 2.0

Dataset Structure

The dataset is in a CSV file. The explanation of each column is in this table.

Uses

Direct Use

The dataset is intended for use in natural language processing tasks, particularly for the development and evaluation of models designed to generate coherent and visually grounded stories from sequences of images.

Out-of-Scope Use

The copyrights of all movie shots belong to the original copyright holders which can be found in the IMDb page of each movie. The IMDb page is indicated by the index in the imdb_id column. For example, for the first row of our data, the imdb_id is tt0112573 so the corresponding imdb page is https://www.imdb.com/title/tt0112573/companycredits/. Do not violate the copyrights while using these images. The usage of these images is limited to academic purposes.

Dataset Creation

Curation Rationale

The dataset was curated to improve the quality of text stories generated from image sequences, focusing on visual storytelling with coherent plots and character grounding.

Source Data

Data Collection and Processing

The source data consists of image sequences extracted from the movie shots from the MovieNet dataset (https://opendatalab.com/OpenDataLab/MovieNet/tree/main/raw), ensuring a coherent plot around one or more main characters.

Who are the source data producers?

The images were initially produced by movie production companies and extracted by authors of MovieNet. The stories are written by crowd workers. Then the stories are compiled and refined by the authors.

Annotations

Annotation process

Crowdworkers were asked to write stories that fit the provided image sequences. The annotation process included reviewing these stories for coherence, grammatical correctness, and alignment with the images. More details are in our paper.

Who are the annotators?

The annotators were five graduate students from Saarland University. Two are native English speakers. The other three are proficient in English.

Personal and Sensitive Information

We do not collect personal or sensitive information. Personal information like worker IDs are not released. Our anonymization process is described in our paper.

Bias, Risks, and Limitations

The stories in this dataset are in English only. Although we have tried our best to filter the images and review the stories, it is not possible to go through all the stories. There could still be biased or harmful content. Please use the dataset carefully.

Citation

Xudong Hong, Asad Sayeed, Khushboo Mehra, Vera Demberg, and Bernt Schiele. 2023. Visual Writing Prompts: Character-Grounded Story Generation with Curated Image SequencesTransactions of the Association for Computational Linguistics, 11:565–581.

BibTeX:

@article{10.1162/tacl_a_00553,
author = {Hong, Xudong and Sayeed, Asad and Mehra, Khushboo and Demberg, Vera and Schiele, Bernt},
title = "{Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences}",
journal = {Transactions of the Association for Computational Linguistics},
volume = {11},
pages = {565-581},
year = {2023},
month = {06},
issn = {2307-387X},
doi = {10.1162/tacl_a_00553},
url = {[https://doi.org/10.1162/tacl\\\\_a\\\\_00553](https://doi.org/10.1162/tacl%5C%5C%5C%5C_a%5C%5C%5C%5C_00553)},
eprint = {[https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl\\\\_a\\\\_00553/2134487/tacl\\\\_a\\\\_00553.pdf](https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl%5C%5C%5C%5C_a%5C%5C%5C%5C_00553/2134487/tacl%5C%5C%5C%5C_a%5C%5C%5C%5C_00553.pdf)},
}

Dataset Card Authors

Xudong Hong

Dataset Card Contact

xLASTNAME@coli.uni-saarland.de

Disclaimer:

All the images are extracted from the movie shots from the MovieNet dataset (https://opendatalab.com/OpenDataLab/MovieNet/tree/main/raw). The copyrights of all movie shots belong to the original copyright holders which can be found in the IMDb page of each movie. The IMDb page is indicated by the index in the imdb_id column. For example, for the first row of our data, the imdb_id is tt0112573 so the corresponding imdb page is https://www.imdb.com/title/tt0112573/companycredits/. Do not violate the copyrights while using these images. We only use these images for academic purposes. Please contact the author if you have any questions.

Papers


Paper Code Results Date Stars

Dataset Loaders


Tasks


Similar Datasets


License


  • Apache License 2.0

Modalities


Languages