Robust Summarization Evaluation Benchmark is a large human evaluation dataset consisting of over 22k summary-level annotations over state-of-the-art systems on three datasets.
Source: Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human EvaluationPaper | Code | Results | Date | Stars |
---|