Search Results for author: Jörg Tiedemann

Found 51 papers, 12 papers with code

Boosting Neural Machine Translation from Finnish to Northern Sámi with Rule-Based Backtranslation

no code implementations NoDaLiDa 2021 Mikko Aulamo, Sami Virpioja, Yves Scherrer, Jörg Tiedemann

Evaluating the results on an in-domain test set and a small out-of-domain set, we find that the RBMT backtranslation outperforms NMT backtranslation clearly for the out-of-domain test set, but also slightly for the in-domain data, for which the NMT backtranslation model provided clearly better BLEU scores than the RBMT.

Machine Translation NMT +2

The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT

no code implementations WMT (EMNLP) 2020 Jörg Tiedemann

This paper describes the development of a new benchmark for machine translation that provides training and test data for thousands of language pairs covering over 500 languages and tools for creating state-of-the-art translation models from that collection.

Few-Shot Learning Machine Translation +1

A Closer Look at Parameter Contributions When Training Neural Language and Translation Models

no code implementations COLING 2022 Raúl Vázquez, Hande Celikkanat, Vinit Ravishankar, Mathias Creutz, Jörg Tiedemann

We analyze the learning dynamics of neural language and translation models using Loss Change Allocation (LCA), an indicator that enables a fine-grained analysis of parameter updates when optimizing for the loss function.

Causal Language Modeling Language Modelling +3

An Empirical Investigation of Word Alignment Supervision for Zero-Shot Multilingual Neural Machine Translation

no code implementations EMNLP 2021 Alessandro Raganato, Raúl Vázquez, Mathias Creutz, Jörg Tiedemann

In this paper, we investigate the benefits of an explicit alignment to language labels in Transformer-based MNMT models in the zero-shot context, by jointly training one cross attention head with word alignment supervision to stress the focus on the target language label.

Machine Translation Translation +1

Creating an Aligned Russian Text Simplification Dataset from Language Learner Data

no code implementations EACL (BSNLP) 2021 Anna Dmitrieva, Jörg Tiedemann

Parallel language corpora where regular texts are aligned with their simplified versions can be used in both natural language processing and theoretical linguistic studies.

Text Simplification

LSDC - A comprehensive dataset for Low Saxon Dialect Classification

no code implementations VarDial (COLING) 2020 Janine Siewert, Yves Scherrer, Martijn Wieling, Jörg Tiedemann

We present a new comprehensive dataset for the unstandardised West-Germanic language Low Saxon covering the last two centuries, the majority of modern dialects and various genres, which will be made openly available in connection with the final version of this paper.

Classification

Modeling Noise in Paraphrase Detection

no code implementations LREC 2022 Teemu Vahtola, Eetu Sjöblom, Jörg Tiedemann, Mathias Creutz

Noisy labels in training data present a challenging issue in classification tasks, misleading a model towards incorrect decisions during training.

OPUS-MT – Building open translation services for the World

no code implementations EAMT 2020 Jörg Tiedemann, Santhosh Thottingal

This paper presents OPUS-MT a project that focuses on the development of free resources and tools for machine translation.

Machine Translation Translation

Can Machine Translation Bridge Multilingual Pretraining and Cross-lingual Transfer Learning?

no code implementations25 Mar 2024 Shaoxiong Ji, Timothee Mickus, Vincent Segonne, Jörg Tiedemann

We furthermore provide evidence through similarity measures and investigation of parameters that this lack of positive influence is due to output separability -- which we argue is of use for machine translation but detrimental elsewhere.

Cross-Lingual Transfer Machine Translation +5

A New Massive Multilingual Dataset for High-Performance Language Technologies

no code implementations20 Mar 2024 Ona de Gibert, Graeme Nail, Nikolay Arefyev, Marta Bañón, Jelmer Van der Linde, Shaoxiong Ji, Jaume Zaragoza-Bernabeu, Mikko Aulamo, Gema Ramírez-Sánchez, Andrey Kutuzov, Sampo Pyysalo, Stephan Oepen, Jörg Tiedemann

We present the HPLT (High Performance Language Technologies) language resources, a new massive multilingual dataset including both monolingual and bilingual corpora extracted from CommonCrawl and previously unused web crawls from the Internet Archive.

Language Modelling Machine Translation +2

SemEval-2024 Shared Task 6: SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes

no code implementations12 Mar 2024 Timothee Mickus, Elaine Zosa, Raúl Vázquez, Teemu Vahtola, Jörg Tiedemann, Vincent Segonne, Alessandro Raganato, Marianna Apidianaki

This paper presents the results of the SHROOM, a shared task focused on detecting hallucinations: outputs from natural language generation (NLG) systems that are fluent, yet inaccurate.

Machine Translation Paraphrase Generation

Domain-specific Continued Pretraining of Language Models for Capturing Long Context in Mental Health

no code implementations20 Apr 2023 Shaoxiong Ji, Tianlin Zhang, Kailai Yang, Sophia Ananiadou, Erik Cambria, Jörg Tiedemann

In the mental health domain, domain-specific language models are pretrained and released, which facilitates the early detection of mental health conditions.

Uncertainty-Aware Natural Language Inference with Stochastic Weight Averaging

1 code implementation10 Apr 2023 Aarne Talman, Hande Celikkanat, Sami Virpioja, Markus Heinonen, Jörg Tiedemann

This paper introduces Bayesian uncertainty modeling using Stochastic Weight Averaging-Gaussian (SWAG) in Natural Language Understanding (NLU) tasks.

Natural Language Inference Natural Language Understanding

Democratizing Neural Machine Translation with OPUS-MT

no code implementations4 Dec 2022 Jörg Tiedemann, Mikko Aulamo, Daria Bakshandaeva, Michele Boggia, Stig-Arne Grönroos, Tommi Nieminen, Alessandro Raganato, Yves Scherrer, Raul Vazquez, Sami Virpioja

This paper presents the OPUS ecosystem with a focus on the development of open machine translation models and tools, and their integration into end-user applications, development platforms and professional workflows.

Machine Translation Translation

When to Laugh and How Hard? A Multimodal Approach to Detecting Humor and its Intensity

no code implementations COLING 2022 Khalid Alnajjar, Mika Hämäläinen, Jörg Tiedemann, Jorma Laaksonen, Mikko Kurimo

Our results show that the model is capable of correctly detecting whether an utterance is humorous 78% of the time and how long the audience's laughter reaction should last with a mean absolute error of 600 milliseconds.

Analyzing the Use of Character-Level Translation with Sparse and Noisy Datasets

no code implementations RANLP 2013 Jörg Tiedemann, Preslav Nakov

This paper provides an analysis of character-level machine translation models used in pivot-based translation when applied to sparse and noisy datasets, such as crowdsourced movie subtitles.

Machine Translation Translation

NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance

1 code implementation NoDaLiDa 2021 Aarne Talman, Marianna Apidianaki, Stergios Chatzikyriakidis, Jörg Tiedemann

We propose a new diagnostics test suite which allows to assess whether a dataset constitutes a good testbed for evaluating the models' meaning understanding capabilities.

Natural Language Inference Sentence

The Tatoeba Translation Challenge -- Realistic Data Sets for Low Resource and Multilingual MT

1 code implementation13 Oct 2020 Jörg Tiedemann

This paper describes the development of a new benchmark for machine translation that provides training and test data for thousands of language pairs covering over 500 languages and tools for creating state-of-the-art translation models from that collection.

Few-Shot Learning Machine Translation +1

Multimodal Machine Translation through Visuals and Speech

no code implementations28 Nov 2019 Umut Sulubacak, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, Jörg Tiedemann

Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data.

Image Captioning Multimodal Machine Translation +4

The University of Helsinki submissions to the WMT19 news translation task

no code implementations WS 2019 Aarne Talman, Umut Sulubacak, Raúl Vázquez, Yves Scherrer, Sami Virpioja, Alessandro Raganato, Arvi Hurskainen, Jörg Tiedemann

In this paper, we present the University of Helsinki submissions to the WMT 2019 shared task on news translation in three language pairs: English-German, English-Finnish and Finnish-English.

Sentence Translation

What do Language Representations Really Represent?

no code implementations CL 2019 Johannes Bjerva, Robert Östling, Maria Han Veiga, Jörg Tiedemann, Isabelle Augenstein

If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations.

Language Modelling Translation

Multilingual NMT with a language-independent attention bridge

1 code implementation WS 2019 Raúl Vázquez, Alessandro Raganato, Jörg Tiedemann, Mathias Creutz

In this paper, we propose a multilingual encoder-decoder architecture capable of obtaining multilingual sentence representations by means of incorporating an intermediate {\em attention bridge} that is shared across all languages.

NMT Sentence +2

Sentence Embeddings in NLI with Iterative Refinement Encoders

1 code implementation27 Aug 2018 Aarne Talman, Anssi Yli-Jyrä, Jörg Tiedemann

We can show that the sentence embeddings learned in this way can be utilized in a wide variety of transfer learning tasks, outperforming InferSent on 7 out of 10 and SkipThought on 8 out of 9 SentEval sentence embedding evaluation tasks.

Natural Language Inference Sentence +3

Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation Tasks

no code implementations WS 2019 Jörg Tiedemann, Yves Scherrer

In this paper, we investigate whether multilingual neural translation models learn stronger semantic abstractions of sentences than bilingual ones.

NMT Paraphrase Generation +1

Emerging Language Spaces Learned From Massively Multilingual Corpora

no code implementations1 Feb 2018 Jörg Tiedemann

Translations capture important information about languages that can be used as implicit supervision in learning linguistic properties and semantic representations.

Machine Translation Translation

Cross-Lingual Dependency Parsing for Closely Related Languages - Helsinki's Submission to VarDial 2017

no code implementations18 Aug 2017 Jörg Tiedemann

This paper describes the submission from the University of Helsinki to the shared task on cross-lingual dependency parsing at VarDial 2017.

Dependency Parsing Translation

Neural machine translation for low-resource languages

no code implementations18 Aug 2017 Robert Östling, Jörg Tiedemann

Neural machine translation (NMT) approaches have improved the state of the art in many machine translation settings over the last couple of years, but they require large amounts of training data to produce sensible output.

Machine Translation NMT +2

Continuous multilinguality with language vectors

no code implementations22 Dec 2016 Robert Östling, Jörg Tiedemann

Most existing models for multilingual natural language processing (NLP) treat language as a discrete category, and make predictions for either one language or the other.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.