WikiLingua includes ~770k article and summary pairs in 18 languages from WikiHow. Gold-standard article-summary alignments across languages are extracted by aligning the images that are used to describe each how-to step in an article.
50 PAPERS • 5 BENCHMARKS
Global Voices is a multilingual dataset for evaluating cross-lingual summarization methods. It is extracted from social-network descriptions of Global Voices news articles to cheaply collect evaluation data for into-English and from-English summarization in 15 languages.
8 PAPERS • NO BENCHMARKS YET
Fanpage dataset, containing news articles taken from Fanpage.
2 PAPERS • 1 BENCHMARK
IlPost dataset, containing news articles taken from IlPost.
The MLSum-it dataset is the translated version (Helsinki-NLP/opus-mt-es-it) of the spanish portion of MLSum, containing news articles taken from BBC/mundo.
1 PAPER • 1 BENCHMARK