3 dataset results for Probing Language Models AND English

BioLAMA is a benchmark comprised of 49K biomedical factual knowledge triples for probing biomedical Language Models. It is used to assess the capabilities of Language Models for being valid biomedical knowledge bases.

13 PAPERS • 1 BENCHMARK

KAMEL

KAMEL (Knowledge Analysis with Multitoken Entities in Language Models)

KAMEL comprises knowledge about 234 relations from Wikidata with a large training, validation, and test dataset. We make sure that all facts are also present in Wikipedia so that they have been seen during the pre-training procedure of the LMs we are probing. Most importantly we overcome the limitations of existing probing datasets by (1) having a larger variety of knowledge graph relations, (2) it contains single- and multi-token entities, (3) we use relations with literals, and (4) have alternative labels for entities. (5) Furthermore, we created an evaluation procedure for higher cardinality relations, which was missing in previous works, and (6) make sure that the dataset can be used for causal LMs.

5 PAPERS • 1 BENCHMARK

BEAR-probe (Benchmark for Evaluating Associative Reasoning)

The $\text{BEAR}$ dataset and its larger version, $\text{BEAR}_{\text{big}}$, are benchmarks for evaluating common factual knowledge contained in language models.

1 PAPER • 1 BENCHMARK

Datasets

3 dataset results for Probing Language Models AND English