The OCW dataset is for evaluating creative problem solving tasks by curating the problems and human performance results from the popular British quiz show Only Connect.

The OCW dataset contains 618 connecting wall puzzles and solutions in total from 15 seasons of the show. Each show episode has two walls.

The dataset has two tasks: Task 1 (Grouping), and Task 2 (Connections) are identical to the quiz-show’s human participant tasks.

Task 1 (Groupings) is evaluated via six metrics: number of solved walls, number of correct groups (max. four per wall), Adjusted Mutual Information (AMI), Adjusted Rand Index (ARI), Fowlkes Mallows Score (FMS), and Wasserstein Distance (WD), normalized to (0, 1) range, between predicted and ground-truth labels.

Task 2 (Connections) is evaluated with three metrics: exact string matching, ROUGE-1 F1, and BERTScore F1.

Baseline results with pre-trained language models and with few-shot In-context Learning (ICL) with LLMs such as GPT-4 are available here:

"Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset" Saeid Alavi Naeini, Raeid Saqur, Mozhgan Saeidi, John Giorgi, Babak Taati. 2023 https://neurips.cc/virtual/2023/poster/73547

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


Modalities


Languages