RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model

24 May 2021  Â·  Milan Straka, Jakub Náplava, Jana Straková, David Samuel ·

We present RobeCzech, a monolingual RoBERTa language representation model trained on Czech data. RoBERTa is a robustly optimized Transformer-based pretraining approach. We show that RobeCzech considerably outperforms equally-sized multilingual and Czech-trained contextualized language representation models, surpasses current state of the art in all five evaluated NLP tasks and reaches state-of-the-art results in four of them. The RobeCzech model is released publicly at https://hdl.handle.net/11234/1-3691 and https://huggingface.co/ufal/robeczech-base.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Semantic Parsing PTG (czech, MRP 2020) PERIN + RobeCzech F1 92.36 # 1

Methods