Search Results for author: Gertjan van Noord

Found 37 papers, 14 papers with code

Unsupervised Translation of German–Lower Sorbian: Exploring Training and Novel Transfer Methods on a Low-Resource Language

no code implementations • WMT (EMNLP) 2021 • Lukas Edman, Ahmet Üstün, Antonio Toral, Gertjan van Noord

This paper describes the methods behind the systems submitted by the University of Groningen for the WMT 2021 Unsupervised Machine Translation task for German–Lower Sorbian (DE–DSB): a high-resource language to a low-resource one.

Translation Unsupervised Machine Translation

Paper
Add Code

Linguistically Motivated Subwords for English-Tamil Translation: University of Groningen’s Submission to WMT-2020

no code implementations • WMT (EMNLP) 2020 • Prajit Dhar, Arianna Bisazza, Gertjan van Noord

This paper describes our submission for the English-Tamil news translation task of WMT-2020.

Machine Translation NMT +1

Paper
Add Code

Optimal Word Segmentation for Neural Machine Translation into Dravidian Languages

no code implementations • ACL (WAT) 2021 • Prajit Dhar, Arianna Bisazza, Gertjan van Noord

Dravidian languages, such as Kannada and Tamil, are notoriously difficult to translate by state-of-the-art neural models.

Machine Translation Segmentation +1

Paper
Add Code

AlpinoGraph: A Graph-based Search Engine for Flexible and Efficient Treebank Search

1 code implementation • TLT (ACL) 2020 • Peter Kleiweg, Gertjan van Noord

Paper
Code

Low-Resource Unsupervised NMT: Diagnosing the Problem and Providing a Linguistically Motivated Solution

1 code implementation • EAMT 2020 • Lukas Edman, Antonio Toral, Gertjan van Noord

Unsupervised Machine Translation has been advancing our ability to translate without parallel data, but state-of-the-art methods assume an abundance of monolingual data.

NMT Translation +2

Paper
Code

Simple Embedding-Based Word Sense Disambiguation

no code implementations • GWC 2018 • Dieke Oele, Gertjan van Noord

The results of our experiments show that by lexically extending the amount of words in the gloss and context, although it works well for other implementations of Lesk, harms our method.

Word Sense Disambiguation

Paper
Add Code

Data Selection for Unsupervised Translation of German–Upper Sorbian

no code implementations • WMT (EMNLP) 2020 • Lukas Edman, Antonio Toral, Gertjan van Noord

This paper describes the methods behind the systems submitted by the University of Groningen for the WMT 2020 Unsupervised Machine Translation task for German–Upper Sorbian.

Translation Unsupervised Machine Translation

Paper
Add Code

Evaluating Pre-training Objectives for Low-Resource Translation into Morphologically Rich Languages

no code implementations • LREC 2022 • Prajit Dhar, Arianna Bisazza, Gertjan van Noord

We conduct our evaluation on four typologically diverse target MRLs, and find that PT-Inflect surpasses NMT systems trained only on parallel data.

Machine Translation NMT +1

Paper
Add Code

UDapter: Typology-based Language Adapters for Multilingual Dependency Parsing and Sequence Labeling

no code implementations • CL (ACL) 2022 • Ahmet Üstün, Arianna Bisazza, Gosse Bouma, Gertjan van Noord

To address this, we propose a novel language adaptation approach by introducing contextual language adapters to a multilingual parser.

Dependency Parsing Language Modelling +4

Paper
Add Code

Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5 for Machine Translation

1 code implementation • 28 Feb 2023 • Lukas Edman, Gabriele Sarti, Antonio Toral, Gertjan van Noord, Arianna Bisazza

Pretrained character-level and byte-level language models have been shown to be competitive with popular subword models across a range of Natural Language Processing (NLP) tasks.

Machine Translation NMT +1

Paper
Code

Subword-Delimited Downsampling for Better Character-Level Translation

1 code implementation • 2 Dec 2022 • Lukas Edman, Antonio Toral, Gertjan van Noord

This new downsampling method not only outperforms existing downsampling methods, showing that downsampling characters can be done without sacrificing quality, but also leads to promising performance compared to subword models for translation.

Machine Translation Translation

Paper
Code

Patching Leaks in the Charformer for Efficient Character-Level Generation

1 code implementation • 27 May 2022 • Lukas Edman, Antonio Toral, Gertjan van Noord

Character-based representations have important advantages over subword-based ones for morphologically rich languages.

NMT Translation

Paper
Code

Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer

1 code implementation • 24 May 2022 • Ahmet Üstün, Arianna Bisazza, Gosse Bouma, Gertjan van Noord, Sebastian Ruder

Massively multilingual models are promising for transfer learning across tasks and languages.

Transfer Learning

Paper
Code

The Importance of Context in Very Low Resource Language Modeling

no code implementations • ICON 2021 • Lukas Edman, Antonio Toral, Gertjan van Noord

This paper investigates very low resource language model pretraining, when less than 100 thousand sentences are available.

Language Modelling POS +1

Paper
Add Code

Unsupervised Translation of German--Lower Sorbian: Exploring Training and Novel Transfer Methods on a Low-Resource Language

1 code implementation • 24 Sep 2021 • Lukas Edman, Ahmet Üstün, Antonio Toral, Gertjan van Noord

Lastly, we experiment with the order in which offline and online back-translation are used to train an unsupervised system, finding that using online back-translation first works better for DE$\rightarrow$DSB by 2. 76 BLEU.

Translation Unsupervised Machine Translation

Paper
Code

A Shared Task of a New, Collaborative Type to Foster Reproducibility: A First Exercise in the Area of Language Science and Technology with REPROLANG2020

no code implementations • LREC 2020 • Ant{\'o}nio Branco, Nicoletta Calzolari, Piek Vossen, Gertjan van Noord, Dieter van Uytvanck, Jo{\~a}o Silva, Lu{\'\i}s Gomes, Andr{\'e} Moreira, Willem Elbers

n this paper, we introduce a new type of shared task {---} which is collaborative rather than competitive {---} designed to support and fosterthe reproduction of research results.

Paper
Add Code

UDapter: Language Adaptation for Truly Universal Dependency Parsing

1 code implementation • EMNLP 2020 • Ahmet Üstün, Arianna Bisazza, Gosse Bouma, Gertjan van Noord

The resulting parser, UDapter, outperforms strong monolingual and multilingual baselines on the majority of both high-resource and low-resource (zero-shot) languages, showing the success of the proposed adaptation approach.

Dependency Parsing Transfer Learning

Paper
Code

BERTje: A Dutch BERT Model

2 code implementations • 19 Dec 2019 • Wietse de Vries, Andreas van Cranenburgh, Arianna Bisazza, Tommaso Caselli, Gertjan van Noord, Malvina Nissim

The transformer-based pre-trained language model BERT has helped to improve state-of-the-art performance on many natural language processing (NLP) tasks.

Ranked #3 on Sentiment Analysis on DBRD

Language Modelling named-entity-recognition +5

130

Paper
Code

Cross-Lingual Word Embeddings for Morphologically Rich Languages

no code implementations • RANLP 2019 • Ahmet {\"U}st{\"u}n, Gosse Bouma, Gertjan van Noord

Cross-lingual word embedding models learn a shared vector space for two or more languages so that words with similar meaning are represented by similar vectors regardless of their language.

Cross-Lingual Word Embeddings Translation +2

Paper
Add Code

Multi-Team: A Multi-attention, Multi-decoder Approach to Morphological Analysis.

no code implementations • WS 2019 • Ahmet {\"U}st{\"u}n, Rob van der Goot, Gosse Bouma, Gertjan van Noord

This paper describes our submission to SIGMORPHON 2019 Task 2: Morphological analysis and lemmatization in context.

LEMMA Lemmatization +3

Paper
Add Code

Squib: Reproducibility in Computational Linguistics: Are We Willing to Share?

no code implementations • CL 2018 • Martijn Wieling, Josine Rawee, Gertjan van Noord

For a selection of ten papers, we attempted to reproduce the results using the provided data and code.

Paper
Add Code

Modeling Input Uncertainty in Neural Network Dependency Parsing

1 code implementation • EMNLP 2018 • Rob van der Goot, Gertjan van Noord

Recently introduced neural network parsers allow for new approaches to circumvent data sparsity issues by modeling character level information and by exploiting raw data in a semi-supervised setting.

Dependency Parsing Lexical Normalization +1

Paper
Code

A Taxonomy for In-depth Evaluation of Normalization for User Generated Content

no code implementations • LREC 2018 • Rob van der Goot, Rik van Noord, Gertjan van Noord

Grammatical Error Correction Lexical Normalization +1

Paper
Add Code

MoNoise: Modeling Noise Using a Modular Normalization System

2 code implementations • 10 Oct 2017 • Rob van der Goot, Gertjan van Noord

We show that MoNoise beats the state-of-the-art on different normalization benchmarks for English and Dutch, which all define the task of normalization slightly different.

Ranked #1 on Lexical Normalization on LexNorm

Lexical Normalization Spelling Correction +1

Paper
Code

The Power of Character N-grams in Native Language Identification

no code implementations • WS 2017 • Artur Kulmizev, Bo Blankers, Johannes Bjerva, Malvina Nissim, Gertjan van Noord, Barbara Plank, Martijn Wieling

In this paper, we explore the performance of a linear SVM trained on language independent character features for the NLI Shared Task 2017.

Native Language Identification Text Classification

Paper
Add Code

Parser Adaptation for Social Media by Integrating Normalization

no code implementations • ACL 2017 • Rob van der Goot, Gertjan van Noord

This work explores different approaches of using normalization for parser adaptation.

Domain Adaptation Named Entity Recognition (NER) +1

Paper
Add Code

Increasing Return on Annotation Investment: The Automatic Construction of a Universal Dependency Treebank for Dutch

no code implementations • WS 2017 • Gosse Bouma, Gertjan van Noord

Paper
Add Code

Distributional Lesk: Effective Knowledge-Based Word Sense Disambiguation

no code implementations • WS 2017 • Dieke Oele, Gertjan van Noord

Learning Word Embeddings Word Sense Disambiguation

Paper
Add Code

SMT and Hybrid systems of the QTLeap project in the WMT16 IT-task

no code implementations • WS 2016 • Rosa Gaudio, Gorka Labaka, Eneko Agirre, Petya Osenova, Kiril Simov, Martin Popel, Dieke Oele, Gertjan van Noord, Lu{\'\i}s Gomes, Jo{\~a}o Ant{\'o}nio Rodrigues, Steven Neale, Jo{\~a}o Silva, Andreia Querido, Nuno Rendeiro, Ant{\'o}nio Branco

Machine Translation

Paper
Add Code

Obituary: In Memoriam: Susan Armstrong

no code implementations • CL 2016 • Pierrette Bouillon, Paola Merlo, Gertjan van Noord, Mike Rosner

Machine Translation

Paper
Add Code

Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders

1 code implementation • NAACL 2016 • Simon Šuster, Ivan Titov, Gertjan van Noord

We present an approach to learning multi-sense word embeddings relying both on monolingual and bilingual information.

Sentence Word Embeddings

Paper
Code

Comparison of Coreference Resolvers for Deep Syntax Translation

no code implementations • WS 2015 • Michal Nov{\'a}k, Dieke Oele, Gertjan van Noord

Coreference Resolution Machine Translation +1

Paper
Add Code

Word Representations, Tree Models and Syntactic Functions

1 code implementation • 31 Aug 2015 • Simon Šuster, Gertjan van Noord, Ivan Titov

Word representations induced from models with discrete latent variables (e. g.\ HMMs) have been shown to be beneficial in many NLP applications.

named-entity-recognition Named Entity Recognition +2

Paper
Code

ROB: Using Semantic Meaning to Recognize Paraphrases

no code implementations • SEMEVAL 2015 • Rob van der Goot, Gertjan van Noord

Semantic Textual Similarity

Paper
Add Code

Lexical choice in Abstract Dependency Trees

no code implementations • WS 2015 • Dieke Oele, Gertjan van Noord

Paper
Add Code

From neighborhood to parenthood: the advantages of dependency representation over bigrams in Brown clustering

1 code implementation • COLING 2014 • Simon {\v{S}}uster, Gertjan van Noord

Clustering Language Modelling

Paper
Code

Treelet Probabilities for HPSG Parsing and Error Correction

no code implementations • LREC 2014 • Angelina Ivanova, Gertjan van Noord

In the second experiment it is tested for the ability to score the parse tree of the correct sentence higher than the constituency tree of the original version of the sentence containing grammatical error.

Grammatical Error Correction Sentence

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.