Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages. The overarching goal of UNER is to provide high-quality, cross-lingually consistent annotations to facilitate and standardize multilingual NER research. UNER v1 contains 18 datasets annotated with named entities in a cross-lingual consistent schema across 12 diverse languages. In this paper, we detail the dataset creation and composition of UNER; we also provide initial modeling baselines on both in-language and cross-lingual learning settings. We release the data, code, and fitted models to the public.

PDF Abstract arXiv 2023 PDF

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Cross-Lingual NER UNER v1 (Cebuano) UNER XML-R (all) F1 (micro) 69.6 # 1
Named Entity Recognition (NER) UNER v1 (Chinese) UNER XML-R F1 (micro) 89.50 # 1
Cross-Lingual NER UNER v1 (Chinese) UNER XML-R (all) F1 (micro) 88.2 # 1
Named Entity Recognition (NER) UNER v1 (Chinese Simplified) UNER XML-R F1 (micro) 89.40 # 1
Cross-Lingual NER UNER v1 (Chinese Simplified) UNER XML-R (all) F1 (micro) 87.7 # 1
Named Entity Recognition (NER) UNER v1 (Croatian) UNER XML-R F1 (micro) 93.60 # 1
Cross-Lingual NER UNER v1 (Croatian) UNER XML-R (all) F1 (micro) 90.9 # 1
Cross-Lingual NER UNER v1 (Danish) UNER XML-R (all) F1 (micro) 83.0 # 1
Named Entity Recognition (NER) UNER v1 (Danish) UNER XML-R F1 (micro) 82.70 # 1
Cross-Lingual NER UNER v1 (English) UNER XML-R (all) F1 (micro) 82.8 # 1
Named Entity Recognition (NER) UNER v1 (English) UNER XML-R F1 (micro) 86.00 # 1
Cross-Lingual NER UNER v1 (Portuguese) UNER XML-R (all) F1 (micro) 82.3 # 1
Named Entity Recognition (NER) UNER v1 (Portuguese) UNER XML-R F1 (micro) 90.4 # 1
Cross-Lingual NER UNER v1 - PUD (Chinese) UNER XML-R (all) F1 (micro) 86.0 # 1
Named Entity Recognition (NER) UNER v1 - PUD (Chinese) UNER XML-R F1 (micro) 87.10 # 1
Cross-Lingual NER UNER v1 - PUD (English) UNER XML-R (all) F1 (micro) 79.5 # 1
Named Entity Recognition (NER) UNER v1 - PUD (English) UNER XML-R F1 (micro) 80.10 # 1
Cross-Lingual NER UNER v1 - PUD (German) UNER XML-R (all) F1 (micro) 78.9 # 1
Named Entity Recognition (NER) UNER v1 - PUD (Portuguese) UNER XML-R F1 (micro) 88.80 # 1
Cross-Lingual NER UNER v1 - PUD (Portuguese) UNER XML-R (all) F1 (micro) 85.1 # 1
Cross-Lingual NER UNER v1 - PUD (Russian) UNER XML-R (all) F1 (micro) 70.6 # 1
Cross-Lingual NER UNER v1 - PUD (Swedish) UNER XML-R (all) F1 (micro) 85.3 # 1
Named Entity Recognition (NER) UNER v1 - PUD (Swedish) UNER XML-R F1 (micro) 82.20 # 1
Named Entity Recognition (NER) UNER v1 (Serbian) UNER XML-R F1 (micro) 94.70 # 1
Cross-Lingual NER UNER v1 (Serbian) UNER XML-R (all) F1 (micro) 95.2 # 1
Named Entity Recognition (NER) UNER v1 (Slovak) UNER XML-R F1 (micro) 85.50 # 1
Cross-Lingual NER UNER v1 (Slovak) UNER XML-R (all) F1 (micro) 81.6 # 1
Cross-Lingual NER UNER v1 (Swedish) UNER XML-R (all) F1 (micro) 88.2 # 1
Named Entity Recognition (NER) UNER v1 (Swedish) UNER XML-R F1 (micro) 88.30 # 1
Cross-Lingual NER UNER v1 (Tagalog T) UNER XML-R (all) F1 (micro) 91.3 # 1
Cross-Lingual NER UNER v1 (Tagalog U) UNER XML-R (all) F1 (micro) 63.8 # 1

Methods


No methods listed for this paper. Add relevant methods here