We provide a new data set XWikiRef for the task of Cross-lingual Multi-document Summarization. This task aims at generating Wikipedia style text in Low Resource languages by taking reference text as input. Overall, the data set contains 8 different languages: bengali (bn), english (en), hindi (hi), marathi (mr), malayalam (ml), odia (or), punjabi (pa) and tamil (ta). It also contains 5 domains: books, films, politicians, sportsman and writers.
1 PAPER • 1 BENCHMARK
needadvice is a dataset for advice classification extracted from Reddit. In this dataset, posts are annotated for whether they contain advice or not. It contains 6,148 samples for training, 816 for validation and 898 for testing.
1 PAPER • NO BENCHMARKS YET