Search Results for author: Jelmer Van der Linde

Found 5 papers, 2 papers with code

The EuroPat Corpus: A Parallel Corpus of European Patent Data

no code implementations LREC 2022 Kenneth Heafield, Elaine Farrow, Jelmer Van der Linde, Gema Ramírez-Sánchez, Dion Wiggins

We present the EuroPat corpus of patent-specific parallel data for 6 official European languages paired with English: German, Spanish, French, Croatian, Norwegian, and Polish.

Machine Translation Translation

A New Massive Multilingual Dataset for High-Performance Language Technologies

no code implementations20 Mar 2024 Ona de Gibert, Graeme Nail, Nikolay Arefyev, Marta Bañón, Jelmer Van der Linde, Shaoxiong Ji, Jaume Zaragoza-Bernabeu, Mikko Aulamo, Gema Ramírez-Sánchez, Andrey Kutuzov, Sampo Pyysalo, Stephan Oepen, Jörg Tiedemann

We present the HPLT (High Performance Language Technologies) language resources, a new massive multilingual dataset including both monolingual and bilingual corpora extracted from CommonCrawl and previously unused web crawls from the Internet Archive.

Language Modelling Machine Translation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.