4 dataset results for Memorization AND Texts

BIG-bench (Beyond the Imitation Game Benchmark)

The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their future capabilities. Big-bench include more than 200 tasks.

222 PAPERS • 134 BENCHMARKS

DS-1000

DS-1000 is a code generation benchmark with a thousand data science questions spanning seven Python libraries that (1) reflects diverse, realistic, and practical use cases, (2) has a reliable metric, (3) defends against memorization by perturbing questions.

30 PAPERS • NO BENCHMARKS YET

PopQA

PopQA is an open-domain QA dataset with 14k QA pairs with fine-grained Wikidata entity ID, Wikipedia page views, and relationship type information.

21 PAPERS • NO BENCHMARKS YET

LM Email Address Leakage

Are Large Pre-Trained Language Models Leaking Your Personal Information? We analyze whether Pre-Trained Language Models (PLMs) are prone to leaking personal information. Specifically, we query PLMs for email addresses with contexts of the email address or prompts containing the owner's name.

1 PAPER • NO BENCHMARKS YET

Datasets

4 dataset results for Memorization AND Texts