SummEdits

Introduced by Laban et al. in LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond

SummEdits is a benchmark designed to measure the ability of Large Language Models (LLMs) to reason about facts and detect inconsistencies. It was proposed as a new protocol for inconsistency detection benchmark creation.

The SummEdits benchmark is implemented across 10 domains. It is 20 times more cost-effective per sample than previous benchmarks and highly reproducible, with an estimated inter-annotator agreement of about 0.91.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Similar Datasets

PTR

DialSummEval

ECTSum

Physical Audiovisual CommonSense

Usage

License

Unknown

SummEdits

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

PTR

DialSummEval

ECTSum

Physical Audiovisual CommonSense

Usage

License

Modalities

Languages

SummEdits

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

PTR

DialSummEval

ECTSum

Physical Audiovisual CommonSense

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages