A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages

ACL 2020 Pedro Javier Ortiz SuárezLaurent RomaryBenoît Sagot

We use the multilingual OSCAR corpus, extracted from Common Crawl via language classification, filtering and cleaning, to train monolingual contextualized word embeddings (ELMo) for five mid-resource languages. We then compare the performance of OSCAR-based and Wikipedia-based ELMo embeddings for these languages on the part-of-speech tagging and parsing tasks... (read more)

PDF Abstract ACL 2020 PDF ACL 2020 Abstract

Code


No code implementations yet. Submit your code now

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper