Search Results for author: Gertjan van Noord

Found 37 papers, 14 papers with code

Unsupervised Translation of German–Lower Sorbian: Exploring Training and Novel Transfer Methods on a Low-Resource Language

no code implementations WMT (EMNLP) 2021 Lukas Edman, Ahmet Üstün, Antonio Toral, Gertjan van Noord

This paper describes the methods behind the systems submitted by the University of Groningen for the WMT 2021 Unsupervised Machine Translation task for German–Lower Sorbian (DE–DSB): a high-resource language to a low-resource one.

Translation Unsupervised Machine Translation

Low-Resource Unsupervised NMT: Diagnosing the Problem and Providing a Linguistically Motivated Solution

1 code implementation EAMT 2020 Lukas Edman, Antonio Toral, Gertjan van Noord

Unsupervised Machine Translation has been advancing our ability to translate without parallel data, but state-of-the-art methods assume an abundance of monolingual data.

NMT Translation +2

Simple Embedding-Based Word Sense Disambiguation

no code implementations GWC 2018 Dieke Oele, Gertjan van Noord

The results of our experiments show that by lexically extending the amount of words in the gloss and context, although it works well for other implementations of Lesk, harms our method.

Word Sense Disambiguation

Data Selection for Unsupervised Translation of German–Upper Sorbian

no code implementations WMT (EMNLP) 2020 Lukas Edman, Antonio Toral, Gertjan van Noord

This paper describes the methods behind the systems submitted by the University of Groningen for the WMT 2020 Unsupervised Machine Translation task for German–Upper Sorbian.

Translation Unsupervised Machine Translation

Evaluating Pre-training Objectives for Low-Resource Translation into Morphologically Rich Languages

no code implementations LREC 2022 Prajit Dhar, Arianna Bisazza, Gertjan van Noord

We conduct our evaluation on four typologically diverse target MRLs, and find that PT-Inflect surpasses NMT systems trained only on parallel data.

Machine Translation NMT +1

Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5 for Machine Translation

1 code implementation28 Feb 2023 Lukas Edman, Gabriele Sarti, Antonio Toral, Gertjan van Noord, Arianna Bisazza

Pretrained character-level and byte-level language models have been shown to be competitive with popular subword models across a range of Natural Language Processing (NLP) tasks.

Machine Translation NMT +1

Subword-Delimited Downsampling for Better Character-Level Translation

1 code implementation2 Dec 2022 Lukas Edman, Antonio Toral, Gertjan van Noord

This new downsampling method not only outperforms existing downsampling methods, showing that downsampling characters can be done without sacrificing quality, but also leads to promising performance compared to subword models for translation.

Machine Translation Translation

Patching Leaks in the Charformer for Efficient Character-Level Generation

1 code implementation27 May 2022 Lukas Edman, Antonio Toral, Gertjan van Noord

Character-based representations have important advantages over subword-based ones for morphologically rich languages.

NMT Translation

The Importance of Context in Very Low Resource Language Modeling

no code implementations ICON 2021 Lukas Edman, Antonio Toral, Gertjan van Noord

This paper investigates very low resource language model pretraining, when less than 100 thousand sentences are available.

Language Modelling POS +1

Unsupervised Translation of German--Lower Sorbian: Exploring Training and Novel Transfer Methods on a Low-Resource Language

1 code implementation24 Sep 2021 Lukas Edman, Ahmet Üstün, Antonio Toral, Gertjan van Noord

Lastly, we experiment with the order in which offline and online back-translation are used to train an unsupervised system, finding that using online back-translation first works better for DE$\rightarrow$DSB by 2. 76 BLEU.

Translation Unsupervised Machine Translation

UDapter: Language Adaptation for Truly Universal Dependency Parsing

1 code implementation EMNLP 2020 Ahmet Üstün, Arianna Bisazza, Gosse Bouma, Gertjan van Noord

The resulting parser, UDapter, outperforms strong monolingual and multilingual baselines on the majority of both high-resource and low-resource (zero-shot) languages, showing the success of the proposed adaptation approach.

Dependency Parsing Transfer Learning

BERTje: A Dutch BERT Model

2 code implementations19 Dec 2019 Wietse de Vries, Andreas van Cranenburgh, Arianna Bisazza, Tommaso Caselli, Gertjan van Noord, Malvina Nissim

The transformer-based pre-trained language model BERT has helped to improve state-of-the-art performance on many natural language processing (NLP) tasks.

Language Modelling named-entity-recognition +5

Cross-Lingual Word Embeddings for Morphologically Rich Languages

no code implementations RANLP 2019 Ahmet {\"U}st{\"u}n, Gosse Bouma, Gertjan van Noord

Cross-lingual word embedding models learn a shared vector space for two or more languages so that words with similar meaning are represented by similar vectors regardless of their language.

Cross-Lingual Word Embeddings Translation +2

Squib: Reproducibility in Computational Linguistics: Are We Willing to Share?

no code implementations CL 2018 Martijn Wieling, Josine Rawee, Gertjan van Noord

For a selection of ten papers, we attempted to reproduce the results using the provided data and code.

Modeling Input Uncertainty in Neural Network Dependency Parsing

1 code implementation EMNLP 2018 Rob van der Goot, Gertjan van Noord

Recently introduced neural network parsers allow for new approaches to circumvent data sparsity issues by modeling character level information and by exploiting raw data in a semi-supervised setting.

Dependency Parsing Lexical Normalization +1

MoNoise: Modeling Noise Using a Modular Normalization System

2 code implementations10 Oct 2017 Rob van der Goot, Gertjan van Noord

We show that MoNoise beats the state-of-the-art on different normalization benchmarks for English and Dutch, which all define the task of normalization slightly different.

Lexical Normalization Spelling Correction +1

The Power of Character N-grams in Native Language Identification

no code implementations WS 2017 Artur Kulmizev, Bo Blankers, Johannes Bjerva, Malvina Nissim, Gertjan van Noord, Barbara Plank, Martijn Wieling

In this paper, we explore the performance of a linear SVM trained on language independent character features for the NLI Shared Task 2017.

Native Language Identification Text Classification

Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders

1 code implementation NAACL 2016 Simon Šuster, Ivan Titov, Gertjan van Noord

We present an approach to learning multi-sense word embeddings relying both on monolingual and bilingual information.

Sentence Word Embeddings

Word Representations, Tree Models and Syntactic Functions

1 code implementation31 Aug 2015 Simon Šuster, Gertjan van Noord, Ivan Titov

Word representations induced from models with discrete latent variables (e. g.\ HMMs) have been shown to be beneficial in many NLP applications.

named-entity-recognition Named Entity Recognition +2

Treelet Probabilities for HPSG Parsing and Error Correction

no code implementations LREC 2014 Angelina Ivanova, Gertjan van Noord

In the second experiment it is tested for the ability to score the parse tree of the correct sentence higher than the constituency tree of the original version of the sentence containing grammatical error.

Grammatical Error Correction Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.