Visual Writing Prompts Dataset

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

**[Hugging Face Datasets (New!)](https://huggingface.co/datasets/tonyhong/vwp)**  | **[Website](https://vwprompt.github.io/)** | **[Github Repository](https://github.com/vwprompt/vwp)** | **[arXiv e-Print](https://arxiv.org/abs/2301.08571)**

The Visual Writing Prompts (VWP) dataset contains almost 2K selected sequences of
movie shots, each including 5-10 images. The image sequences are aligned with a total of 12K stories which are collected via crowdsourcing given the image sequences and up to 5  grounded characters from the corresponding image sequence.

## Dataset Details

### Links

- **TACL 2023 Paper:** [Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences](https://doi.org/10.1162/tacl_a_00553)

### Dataset Description

The Visual Writing Prompts (VWP) dataset is designed to facilitate the development and testing of natural language processing models that generate stories based on sequences of images. This dataset comprises nearly 2,000 curated sequences of movie shots, each sequence containing between 5 to 10 images. These images are meticulously selected to ensure they depict coherent plots centered around one or more main characters, enhancing the visual narrative structure for story generation. Aligned with these image sequences are approximately 12,000 stories, which were written by crowd workers using Amazon Mechanical Turk. This setup aims to provide a rich, visually grounded storytelling context that helps models generate more coherent, diverse, and engaging stories.

- **Curated by:** Xudong Hong, Asad Sayeed, Khushboo Mehra, Vera Demberg, Bernt Schiele
- **Funded by:** See Acknowledgments in our paper
- **Language(s) (NLP):** English
- **License:** Apache License 2.0

## Dataset Structure

The dataset is in a CSV file. The explanation of each column is in [this table](https://github.com/vwprompt/vwp/blob/main/column_explain.csv).

## Uses

### Direct Use

The dataset is intended for use in natural language processing tasks, particularly for the development and evaluation of models designed to generate coherent and visually grounded stories from sequences of images.

### Out-of-Scope Use

The copyrights of all movie shots belong to the original copyright holders which can be found in the IMDb page of each movie. The IMDb page is indicated by the index in the `imdb_id` column. For example, for the first row of our data, the `imdb_id` is `tt0112573` so the corresponding imdb page is https://www.imdb.com/title/tt0112573/companycredits/. Do not violate the copyrights while using these images. The usage of these images is limited to academic purposes.

## Dataset Creation

### Curation Rationale

The dataset was curated to improve the quality of text stories generated from image sequences, focusing on visual storytelling with coherent plots and character grounding.

### Source Data

### Data Collection and Processing

The source data consists of image sequences extracted from the movie shots from the MovieNet dataset (https://opendatalab.com/OpenDataLab/MovieNet/tree/main/raw), ensuring a coherent plot around one or more main characters.

### Who are the source data producers?

The images were initially produced by movie production companies and extracted by authors of MovieNet. The stories are written by crowd workers. Then the stories are compiled and refined by the authors.

### Annotations

### Annotation process

Crowdworkers were asked to write stories that fit the provided image sequences. The annotation process included reviewing these stories for coherence, grammatical correctness, and alignment with the images. More details are in our paper.

### Who are the annotators?

The annotators were five graduate students from Saarland University. Two are native English speakers. The other three are proficient in English.

### Personal and Sensitive Information

We do not collect personal or sensitive information. Personal information like worker IDs are not released. Our anonymization process is described in our paper.

## Bias, Risks, and Limitations

The stories in this dataset are in English only. Although we have tried our best to filter the images and review the stories, it is not possible to go through all the stories. There could still be biased or harmful content. Please use the dataset carefully.

## Citation

Xudong Hong, Asad Sayeed, Khushboo Mehra, Vera Demberg, and Bernt Schiele. 2023. [Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences](https://aclanthology.org/2023.tacl-1.33). *Transactions of the Association for Computational Linguistics*, 11:565–581.

**BibTeX:**

```latex
@article{10.1162/tacl_a_00553,
author = {Hong, Xudong and Sayeed, Asad and Mehra, Khushboo and Demberg, Vera and Schiele, Bernt},
title = "{Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences}",
journal = {Transactions of the Association for Computational Linguistics},
volume = {11},
pages = {565-581},
year = {2023},
month = {06},
issn = {2307-387X},
doi = {10.1162/tacl_a_00553},
url = {[https://doi.org/10.1162/tacl\\\\_a\\\\_00553](https://doi.org/10.1162/tacl%5C%5C%5C%5C_a%5C%5C%5C%5C_00553)},
eprint = {[https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl\\\\_a\\\\_00553/2134487/tacl\\\\_a\\\\_00553.pdf](https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl%5C%5C%5C%5C_a%5C%5C%5C%5C_00553/2134487/tacl%5C%5C%5C%5C_a%5C%5C%5C%5C_00553.pdf)},
}
```

## Dataset Card Authors

Xudong Hong

## Dataset Card Contact

[xLASTNAME@coli.uni-saarland.de](mailto:xLASTNAME@coli.uni-saarland.de)

# Disclaimer:

All the images are extracted from the movie shots from the MovieNet dataset (https://opendatalab.com/OpenDataLab/MovieNet/tree/main/raw). The copyrights of all movie shots belong to the original copyright holders which can be found in the IMDb page of each movie. The IMDb page is indicated by the index in the `imdb_id` column. For example, for the first row of our data, the `imdb_id` is `tt0112573` so the corresponding imdb page is https://www.imdb.com/title/tt0112573/companycredits/. Do not violate the copyrights while using these images. We only use these images for academic purposes. Please contact the author if you have any questions.

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

---

Visual Writing Prompts

Dataset Details

Links

Dataset Description

Dataset Structure

Uses

Direct Use

Out-of-Scope Use

Dataset Creation

Curation Rationale

Source Data

Data Collection and Processing

Who are the source data producers?

Annotations

Annotation process

Who are the annotators?

Personal and Sensitive Information

Bias, Risks, and Limitations

Citation

Dataset Card Authors

Dataset Card Contact

Disclaimer:

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

VIST

WritingPrompts

MovieNet

Usage

License

Modalities

Languages