2 dataset results for General Knowledge AND Texts

BIG-bench (Beyond the Imitation Game Benchmark)

The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their future capabilities. Big-bench include more than 200 tasks.

222 PAPERS • 134 BENCHMARKS

BEAR-big (Benchmark for Evaluating Associative Reasoning)

The $\text{BEAR}$ dataset and its larger version, $\text{BEAR}_{\text{big}}$, are benchmarks for evaluating common factual knowledge contained in language models.

1 PAPER • NO BENCHMARKS YET

Datasets

2 dataset results for General Knowledge AND Texts