WikiANN (PAN-X)

Introduced by Pan et al. in Cross-lingual Name Tagging and Linking for 282 Languages

WikiANN, also known as PAN-X, is a multilingual named entity recognition dataset. It consists of Wikipedia articles that have been annotated with LOC (location), PER (person), and ORG (organization) tags in the IOB2 format¹². This dataset serves as a valuable resource for training and evaluating named entity recognition models across various languages.

For instance, it includes information about notable individuals, places, and organizations mentioned in Wikipedia articles. Researchers and practitioners can use WikiANN to develop and improve natural language processing systems that identify and classify named entities in text.

(1) wikiann · Datasets at Hugging Face. https://huggingface.co/datasets/wikiann. (2) wikiann | TensorFlow Datasets. https://tensorflow.google.cn/datasets/catalog/wikiann. (3) wikiann · Datasets at Hugging Face. https://huggingface.co/datasets/wikiann/viewer/en. (4) WikiAnn Dataset | Papers With Code. https://paperswithcode.com/dataset/wikiann-1.

Homepage