SummEdits is a benchmark designed to measure the ability of Large Language Models (LLMs) to reason about facts and detect inconsistencies. It was proposed as a new protocol for inconsistency detection benchmark creation.
The SummEdits benchmark is implemented across 10 domains. It is 20 times more cost-effective per sample than previous benchmarks and highly reproducible, with an estimated inter-annotator agreement of about 0.91.
Paper | Code | Results | Date | Stars |
---|