4 dataset results for Topic Classification AND English

The Yahoo! Answers topic classification dataset is constructed using 10 largest main categories. Each class contains 140,000 training samples and 6,000 testing samples. Therefore, the total number of training samples is 1,400,000 and testing samples 60,000 in this dataset. From all the answers and other meta-information, we only used the best answer content and the main category information. Source:github

121 PAPERS • 2 BENCHMARKS

Amazon Product Data

This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014.

33 PAPERS • 6 BENCHMARKS

20Newsgroup (10 tasks)

This dataset has 20 classes and each class has about 1000 documents. The data split for train/validation/test is 1600/200/200. We created 10 tasks, 2 classes per task. Since this is topic-based text classification data, the classes are very different and have little shared knowledge. As mentioned above, this application (and dataset) is mainly used to show a CL model's ability to overcome forgetting. Detailed statistics please on page https://github.com/ZixuanKe/PyContinual

11 PAPERS • 1 BENCHMARK

Mapping Topics in 100,000 Real-Life Moral Dilemmas

Mapping Topics in 100,000 Real-Life Moral Dilemmas (Tuan Dung nguyen)

This dataset accompanies the ICWSM 2022 paper "Mapping Topics in 100,000 Real-Life Moral Dilemmas".

1 PAPER • NO BENCHMARKS YET

Datasets

4 dataset results for Topic Classification AND English