8TAGS

Introduced by Dadas et al. in Evaluation of Sentence Representations in Polish

The 8TAGS dataset is a corpus specifically created for the evaluation of sentence representations in Polish. It consists of approximately 50,000 sentences annotated with eight topic labels, including film, history, food, medicine, motorization, work, sport, and technology. The dataset was automatically generated by extracting sentences from headlines and short descriptions of articles posted on the Polish social networking site wykop.pl. The corpus contains cleaned and tokenized, unambiguous sentences, each tagged with only one of the selected categories and longer than 30 characters. The classification accuracy is reported for this dataset as a part of the evaluation of sentence representations in Polish.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Similar Datasets

PolEmo 2.0

PSC

PPC

KLEJ

Usage

License

Unknown

8TAGS

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

PolEmo 2.0

PSC

PPC

KLEJ

Usage

License

Modalities

Languages

8TAGS

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

PolEmo 2.0

PSC

PPC

KLEJ

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages