CamemBERT: a Tasty French Language Model

ACL 2020 Louis MartinBenjamin MullerPedro Javier Ortiz SuárezYoann DupontLaurent RomaryÉric Villemonte de la ClergerieDjamé SeddahBenoît Sagot

Pretrained language models are now ubiquitous in Natural Language Processing. Despite their success, most available models have either been trained on English data or on the concatenation of data in multiple languages... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Dependency Parsing French GSD CamemBERT LAS 92.47 # 1
UAS 94.82 # 1
Part-Of-Speech Tagging French GSD CamemBERT UPOS 98.19 # 1
Named Entity Recognition French Treebank CamemBERT (subword masking) F1 87.93 # 1
Precision 88.35 # 1
Recall 87.46 # 1
Part-Of-Speech Tagging ParTUT CamemBERT UPOS 97.63 # 1
Dependency Parsing ParTUT CamemBERT LAS 92.9 # 1
UAS 95.21 # 1
Dependency Parsing Sequoia Treebank CamemBERT LAS 94.39 # 1
UAS 95.56 # 1
Part-Of-Speech Tagging Sequoia Treebank CamemBERT UPOS 99.21 # 1
Dependency Parsing Spoken Corpus CamemBERT LAS 80.07 # 1
UAS 86.05 # 1
Part-Of-Speech Tagging Spoken Corpus CamemBERT UPOS 96.68 # 1
Natural Language Inference XNLI French CamemBERT Accuracy 81.2 # 1

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet