A general purpose text categorization dataset (NatCat) from three online resources: Wikipedia, Reddit, and Stack Exchange. These datasets consist of document-category pairs derived from manual curation that occurs naturally by their communities.
Source: Natcat: Weakly Supervised Text Classification with Naturally Annotated DatasetsPaper | Code | Results | Date | Stars |
---|