5 dataset results for Token Classification AND Texts

CoNLL 2003

CoNLL-2003 is a named entity recognition dataset released as a part of CoNLL-2003 shared task: language-independent named entity recognition. The data consists of eight files covering two languages: English and German. For each of the languages there is a training file, a development file, a test file and a large file with unannotated data.

639 PAPERS • 16 BENCHMARKS

CoNLL 2002

The shared task of CoNLL-2002 concerns language-independent named entity recognition. The types of named entities include: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups. The participants of the shared task were offered training and test data for at least two languages. Information sources other than the training data might have been used in this shared task.

70 PAPERS • 3 BENCHMARKS

XTREME (Cross-Lingual Transfer Evaluation of Multilingual Encoders)

The Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME) benchmark was introduced to encourage more research on multilingual transfer learning,. XTREME covers 40 typologically diverse languages spanning 12 language families and includes 9 tasks that require reasoning about different levels of syntax or semantics.

47 PAPERS • 2 BENCHMARKS

WikiNEuRal

WikiNEuRal is a high-quality automatically-generated dataset for Multilingual Named Entity Recognition.

5 PAPERS • NO BENCHMARKS YET

LeNER-Br

LeNER-Br is a dataset for named entity recognition (NER) in Brazilian Legal Text.

4 PAPERS • 2 BENCHMARKS

Datasets

5 dataset results for Token Classification AND Texts