SummEdits is a benchmark designed to measure the ability of Large Language Models (LLMs) to reason about facts and detect inconsistencies. It was proposed as a new protocol for inconsistency detection benchmark creation.

The SummEdits benchmark is implemented across 10 domains. It is 20 times more cost-effective per sample than previous benchmarks and highly reproducible, with an estimated inter-annotator agreement of about 0.91.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages