The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their future capabilities. Big-bench include more than 200 tasks.
222 PAPERS • 134 BENCHMARKS
The $\text{BEAR}$ dataset and its larger version, $\text{BEAR}_{\text{big}}$, are benchmarks for evaluating common factual knowledge contained in language models.
1 PAPER • NO BENCHMARKS YET