Search Results for author: Miquel Esplà-Gomis

Found 12 papers, 4 papers with code

Surprise Language Challenge: Developing a Neural Machine Translation System between Pashto and English in Two Months

no code implementations • MTSummit 2021 • Alexandra Birch, Barry Haddow, Antonio Valerio Miceli Barone, Jindrich Helcl, Jonas Waldendorf, Felipe Sánchez Martínez, Mikel Forcada, Víctor Sánchez Cartagena, Juan Antonio Pérez-Ortiz, Miquel Esplà-Gomis, Wilker Aziz, Lina Murady, Sevi Sariisik, Peggy van der Kreeft, Kay Macquarrie

We find that starting from an existing large model pre-trained on 50languages leads to far better BLEU scores than pretraining on one high-resource language pair with a smaller model.

Machine Translation Transfer Learning +1

Paper
Add Code

Building Domain-specific Corpora from the Web: the Case of European Digital Service Infrastructures

no code implementations • LREC (BUCC) 2022 • Rik van Noord, Cristian García-Romero, Miquel Esplà-Gomis, Leopoldo Pla Sempere, Antonio Toral

An important goal of the MaCoCu project is to improve EU-specific NLP systems that concern their Digital Service Infrastructures (DSIs).

Machine Translation Zero-Shot Learning

Paper
Add Code

An English-Swahili parallel corpus and its use for neural machine translation in the news domain

no code implementations • EAMT 2020 • Felipe Sánchez-Martínez, Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Mikel L. Forcada, Miquel Esplà-Gomis, Andrew Secker, Susie Coleman, Julie Wall

This paper describes our approach to create a neural machine translation system to translate between English and Swahili (both directions) in the news domain, as well as the process we followed to crawl the necessary parallel corpora from the Internet.

Machine Translation Translation

Paper
Add Code

Bicleaner at WMT 2020: Universitat d’Alacant-Prompsit’s submission to the parallel corpus filtering shared task

no code implementations • WMT (EMNLP) 2020 • Miquel Esplà-Gomis, Víctor M. Sánchez-Cartagena, Jaume Zaragoza-Bernabeu, Felipe Sánchez-Martínez

This paper describes the joint submission of Universitat d’Alacant and Prompsit Language Engineering to the WMT 2020 shared task on parallel corpus filtering.

Paper
Add Code

MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages

no code implementations • EAMT 2022 • Marta Bañón, Miquel Esplà-Gomis, Mikel L. Forcada, Cristian García-Romero, Taja Kuzman, Nikola Ljubešić, Rik van Noord, Leopoldo Pla Sempere, Gema Ramírez-Sánchez, Peter Rupnik, Vít Suchomel, Antonio Toral, Tobias van der Werff, Jaume Zaragoza

We introduce the project “MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages”, funded by the Connecting Europe Facility, which is aimed at building monolingual and parallel corpora for under-resourced European languages.

Paper
Add Code

One-parameter models for sentence-level post-editing effort estimation

no code implementations • MTSummit 2017 • Mikel L. Forcada, Miquel Esplà-Gomis, Felipe Sánchez-Martínez, Lucia Specia

Sentence

Paper
Add Code

Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages

no code implementations • 13 Mar 2024 • Rik van Noord, Taja Kuzman, Peter Rupnik, Nikola Ljubešić, Miquel Esplà-Gomis, Gema Ramírez-Sánchez, Antonio Toral

Large, curated, web-crawled corpora play a vital role in training language models (LMs).

Paper
Add Code

Non-Fluent Synthetic Target-Language Data Improve Neural Machine Translation

1 code implementation • 29 Jan 2024 • Víctor M. Sánchez-Cartagena, Miquel Esplà-Gomis, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez

When the amount of parallel sentences available to train a neural machine translation is scarce, a common practice is to generate new synthetic training samples from them.

Machine Translation Translation

Paper
Code

Cross-lingual neural fuzzy matching for exploiting target-language monolingual corpora in computer-aided translation

1 code implementation • 16 Jan 2024 • Miquel Esplà-Gomis, Víctor M. Sánchez-Cartagena, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez

The paper presents an automatic evaluation of these techniques on four language pairs that shows that our approach can successfully exploit monolingual texts in a TM-based CAT environment, increasing the amount of useful translation proposals, and that our neural model for estimating the post-editing effort enables the combination of translation proposals obtained from monolingual corpora and from TMs in the usual way.

Sentence Sentence Embeddings +1

Paper
Code

Rethinking Data Augmentation for Low-Resource Neural Machine Translation: A Multi-Task Learning Approach

1 code implementation • EMNLP 2021 • Víctor M. Sánchez-Cartagena, Miquel Esplà-Gomis, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez

Many DA approaches aim at expanding the support of the empirical data distribution by generating new sentence pairs that contain infrequent words, thus making it closer to the true data distribution of parallel sentences.

Data Augmentation Low-Resource Neural Machine Translation +3

Paper
Code

Estimating post-editing effort: a study on human judgements, task-based and reference-based metrics of MT quality

1 code implementation • EMNLP (IWSLT) 2019 • Carolina Scarton, Mikel L. Forcada, Miquel Esplà-Gomis, Lucia Specia

To that end, we report experiments on a dataset with newly-collected post-editing indicators and show their usefulness when estimating post-editing effort.

Machine Translation Translation

Paper
Code

UAlacant machine translation quality estimation at WMT 2018: a simple approach using phrase tables and feed-forward neural networks

no code implementations • WS 2018 • Miquel Esplà-Gomis, Felipe Sánchez-Martínez, Mikel L. Forcada

We describe the Universitat d'Alacant submissions to the word- and sentence-level machine translation (MT) quality estimation (QE) shared task at WMT 2018.

Machine Translation Sentence +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.