Search Results for author: Jörg Tiedemann

Found 51 papers, 12 papers with code

Boosting Neural Machine Translation from Finnish to Northern Sámi with Rule-Based Backtranslation

no code implementations • NoDaLiDa 2021 • Mikko Aulamo, Sami Virpioja, Yves Scherrer, Jörg Tiedemann

Evaluating the results on an in-domain test set and a small out-of-domain set, we find that the RBMT backtranslation outperforms NMT backtranslation clearly for the out-of-domain test set, but also slightly for the in-domain data, for which the NMT backtranslation model provided clearly better BLEU scores than the RBMT.

Machine Translation NMT +2

Paper
Add Code

The Helsinki submission to the AmericasNLP shared task

no code implementations • NAACL (AmericasNLP) 2021 • Raúl Vázquez, Yves Scherrer, Sami Virpioja, Jörg Tiedemann

The University of Helsinki participated in the AmericasNLP shared task for all ten language pairs.

NMT

Paper
Add Code

Towards a balanced annotated Low Saxon dataset for diachronic investigation of dialectal variation

no code implementations • KONVENS (WS) 2021 • Janine Siewert, Yves Scherrer, Jörg Tiedemann

Paper
Add Code

The Tatoeba Translation Challenge – Realistic Data Sets for Low Resource and Multilingual MT

no code implementations • WMT (EMNLP) 2020 • Jörg Tiedemann

This paper describes the development of a new benchmark for machine translation that provides training and test data for thousands of language pairs covering over 500 languages and tools for creating state-of-the-art translation models from that collection.

Few-Shot Learning Machine Translation +1

Paper
Add Code

The OPUS Resource Repository: An Open Package for Creating Parallel Corpora and Machine Translation Services

no code implementations • WS (NoDaLiDa) 2019 • Mikko Aulamo, Jörg Tiedemann

This paper presents a flexible and powerful system for creating parallel corpora and for running neural machine translation services.

Machine Translation Translation

Paper
Add Code

MT for Subtitling: Investigating professional translators’ user experience and feedback

no code implementations • AMTA 2020 • Maarit Koponen, Umut Sulubacak, Kaisa Vitikainen, Jörg Tiedemann

Paper
Add Code

Helsinki-NLP at SemEval-2022 Task 2: A Feature-Based Approach to Multilingual Idiomaticity Detection

no code implementations • SemEval (NAACL) 2022 • Sami Itkonen, Jörg Tiedemann, Mathias Creutz

This paper describes the University of Helsinki submission to the SemEval 2022 task on multilingual idiomaticity detection.

Feature Engineering Task 2

Paper
Add Code

A Closer Look at Parameter Contributions When Training Neural Language and Translation Models

no code implementations • COLING 2022 • Raúl Vázquez, Hande Celikkanat, Vinit Ravishankar, Mathias Creutz, Jörg Tiedemann

We analyze the learning dynamics of neural language and translation models using Loss Change Allocation (LCA), an indicator that enables a fine-grained analysis of parameter updates when optimizing for the loss function.

Causal Language Modeling Language Modelling +3

Paper
Add Code

An Empirical Investigation of Word Alignment Supervision for Zero-Shot Multilingual Neural Machine Translation

no code implementations • EMNLP 2021 • Alessandro Raganato, Raúl Vázquez, Mathias Creutz, Jörg Tiedemann

In this paper, we investigate the benefits of an explicit alignment to language labels in Transformer-based MNMT models in the zero-shot context, by jointly training one cross attention head with word alignment supervision to stress the focus on the target language label.

Machine Translation Translation +1

Paper
Add Code

Creating an Aligned Russian Text Simplification Dataset from Language Learner Data

no code implementations • EACL (BSNLP) 2021 • Anna Dmitrieva, Jörg Tiedemann

Parallel language corpora where regular texts are aligned with their simplified versions can be used in both natural language processing and theoretical linguistic studies.

Text Simplification

Paper
Add Code

LSDC - A comprehensive dataset for Low Saxon Dialect Classification

no code implementations • VarDial (COLING) 2020 • Janine Siewert, Yves Scherrer, Martijn Wieling, Jörg Tiedemann

We present a new comprehensive dataset for the unstandardised West-Germanic language Low Saxon covering the last two centuries, the majority of modern dialects and various genres, which will be made openly available in connection with the final version of this paper.

Classification

Paper
Add Code

Latest Development in the FoTran Project – Scaling Up Language Coverage in Neural Machine Translation Using Distributed Training with Language-Specific Components

no code implementations • EAMT 2022 • Raúl Vázquez, Michele Boggia, Alessandro Raganato, Niki A. Loppi, Stig-Arne Grönroos, Jörg Tiedemann

We describe the enhancement of a multilingual NMT toolkit developed as part of the FoTran project.

Machine Translation NMT +1

Paper
Add Code

MT for subtitling: User evaluation of post-editing productivity

no code implementations • EAMT 2020 • Maarit Koponen, Umut Sulubacak, Kaisa Vitikainen, Jörg Tiedemann

This paper presents a user evaluation of machine translation and post-editing for TV subtitles.

Machine Translation Translation

Paper
Add Code

Modeling Noise in Paraphrase Detection

no code implementations • LREC 2022 • Teemu Vahtola, Eetu Sjöblom, Jörg Tiedemann, Mathias Creutz

Noisy labels in training data present a challenging issue in classification tasks, misleading a model towards incorrect decisions during training.

Paper
Add Code

The MUCOW word sense disambiguation test suite at WMT 2020

1 code implementation • WMT (EMNLP) 2020 • Yves Scherrer, Alessandro Raganato, Jörg Tiedemann

This paper reports on our participation with the MUCOW test suite at the WMT 2020 news translation task.

NMT Translation +1

Paper
Code

OPUS – parallel corpora for everyone

no code implementations • EAMT 2016 • Jörg Tiedemann

Paper
Add Code

Controlling the Imprint of Passivization and Negation in Contextualized Representations

no code implementations • EMNLP (BlackboxNLP) 2020 • Hande Celikkanat, Sami Virpioja, Jörg Tiedemann, Marianna Apidianaki

Contextualized word representations encode rich information about syntax and semantics, alongside specificities of each context of use.

Language Modelling Masked Language Modeling +2

Paper
Add Code

OPUS-MT – Building open translation services for the World

no code implementations • EAMT 2020 • Jörg Tiedemann, Santhosh Thottingal

This paper presents OPUS-MT a project that focuses on the development of free resources and tools for machine translation.

Machine Translation Translation

Paper
Add Code

Can Machine Translation Bridge Multilingual Pretraining and Cross-lingual Transfer Learning?

no code implementations • 25 Mar 2024 • Shaoxiong Ji, Timothee Mickus, Vincent Segonne, Jörg Tiedemann

We furthermore provide evidence through similarity measures and investigation of parameters that this lack of positive influence is due to output separability -- which we argue is of use for machine translation but detrimental elsewhere.

Cross-Lingual Transfer Machine Translation +5

Paper
Add Code

A New Massive Multilingual Dataset for High-Performance Language Technologies

no code implementations • 20 Mar 2024 • Ona de Gibert, Graeme Nail, Nikolay Arefyev, Marta Bañón, Jelmer Van der Linde, Shaoxiong Ji, Jaume Zaragoza-Bernabeu, Mikko Aulamo, Gema Ramírez-Sánchez, Andrey Kutuzov, Sampo Pyysalo, Stephan Oepen, Jörg Tiedemann

We present the HPLT (High Performance Language Technologies) language resources, a new massive multilingual dataset including both monolingual and bilingual corpora extracted from CommonCrawl and previously unused web crawls from the Internet Archive.

Language Modelling Machine Translation +2

Paper
Add Code

MAMMOTH: Massively Multilingual Modular Open Translation @ Helsinki

1 code implementation • 12 Mar 2024 • Timothee Mickus, Stig-Arne Grönroos, Joseph Attieh, Michele Boggia, Ona de Gibert, Shaoxiong Ji, Niki Andreas Lopi, Alessandro Raganato, Raúl Vázquez, Jörg Tiedemann

NLP in the age of monolithic large language models is approaching its limits in terms of size and information that can be handled.

Machine Translation Philosophy +1

Paper
Code

SemEval-2024 Shared Task 6: SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes

no code implementations • 12 Mar 2024 • Timothee Mickus, Elaine Zosa, Raúl Vázquez, Teemu Vahtola, Jörg Tiedemann, Vincent Segonne, Alessandro Raganato, Marianna Apidianaki

This paper presents the results of the SHROOM, a shared task focused on detecting hallucinations: outputs from natural language generation (NLG) systems that are fluent, yet inaccurate.

Machine Translation Paraphrase Generation

Paper
Add Code

MaLA-500: Massive Language Adaptation of Large Language Models

no code implementations • 24 Jan 2024 • Peiqin Lin, Shaoxiong Ji, Jörg Tiedemann, André F. T. Martins, Hinrich Schütze

Large language models (LLMs) have advanced the state of the art in natural language processing.

In-Context Learning Language Modelling +1

Paper
Add Code

Domain-specific Continued Pretraining of Language Models for Capturing Long Context in Mental Health

no code implementations • 20 Apr 2023 • Shaoxiong Ji, Tianlin Zhang, Kailai Yang, Sophia Ananiadou, Erik Cambria, Jörg Tiedemann

In the mental health domain, domain-specific language models are pretrained and released, which facilitates the early detection of mental health conditions.

Paper
Add Code

Uncertainty-Aware Natural Language Inference with Stochastic Weight Averaging

1 code implementation • 10 Apr 2023 • Aarne Talman, Hande Celikkanat, Sami Virpioja, Markus Heinonen, Jörg Tiedemann

This paper introduces Bayesian uncertainty modeling using Stochastic Weight Averaging-Gaussian (SWAG) in Natural Language Understanding (NLU) tasks.

Natural Language Inference Natural Language Understanding

Paper
Code

Democratizing Neural Machine Translation with OPUS-MT

no code implementations • 4 Dec 2022 • Jörg Tiedemann, Mikko Aulamo, Daria Bakshandaeva, Michele Boggia, Stig-Arne Grönroos, Tommi Nieminen, Alessandro Raganato, Yves Scherrer, Raul Vazquez, Sami Virpioja

This paper presents the OPUS ecosystem with a focus on the development of open machine translation models and tools, and their integration into end-user applications, development platforms and professional workflows.

Machine Translation Translation

Paper
Add Code

When to Laugh and How Hard? A Multimodal Approach to Detecting Humor and its Intensity

no code implementations • COLING 2022 • Khalid Alnajjar, Mika Hämäläinen, Jörg Tiedemann, Jorma Laaksonen, Mikko Kurimo

Our results show that the model is capable of correctly detecting whether an utterance is humorous 78% of the time and how long the audience's laughter reaction should last with a mean absolute error of 600 milliseconds.

Paper
Add Code

How Does Data Corruption Affect Natural Language Understanding Models? A Study on GLUE datasets

1 code implementation • *SEM (NAACL) 2022 • Aarne Talman, Marianna Apidianaki, Stergios Chatzikyriakidis, Jörg Tiedemann

A central question in natural language understanding (NLU) research is whether high performance demonstrates the models' strong reasoning capabilities.

Natural Language Understanding

Paper
Code

Analyzing the Use of Character-Level Translation with Sparse and Noisy Datasets

no code implementations • RANLP 2013 • Jörg Tiedemann, Preslav Nakov

This paper provides an analysis of character-level machine translation models used in pivot-based translation when applied to sparse and noisy datasets, such as crowdsourced movie subtitles.

Machine Translation Translation

Paper
Add Code

NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance

1 code implementation • NoDaLiDa 2021 • Aarne Talman, Marianna Apidianaki, Stergios Chatzikyriakidis, Jörg Tiedemann

We propose a new diagnostics test suite which allows to assess whether a dataset constitutes a good testbed for evaluating the models' meaning understanding capabilities.

Natural Language Inference Sentence

Paper
Code

XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

1 code implementation • COLING 2020 • Emily Öhman, Marc Pàmies, Kaisla Kajava, Jörg Tiedemann

We introduce XED, a multilingual fine-grained emotion dataset.

Sentiment Analysis

Paper
Code

The Tatoeba Translation Challenge -- Realistic Data Sets for Low Resource and Multilingual MT

1 code implementation • 13 Oct 2020 • Jörg Tiedemann

Few-Shot Learning Machine Translation +1

768

Paper
Code

LT@Helsinki at SemEval-2020 Task 12: Multilingual or language-specific BERT?

no code implementations • SEMEVAL 2020 • Marc Pàmies, Emily Öhman, Kaisla Kajava, Jörg Tiedemann

This paper presents the different models submitted by the LT@Helsinki team for the SemEval 2020 Shared Task 12.

Language Identification

Paper
Add Code

Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation

no code implementations • Findings of the Association for Computational Linguistics 2020 • Alessandro Raganato, Yves Scherrer, Jörg Tiedemann

Transformer-based models have brought a radical change to neural machine translation.

Machine Translation Position +1

Paper
Add Code

Multimodal Machine Translation through Visuals and Speech

no code implementations • 28 Nov 2019 • Umut Sulubacak, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, Jörg Tiedemann

Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data.

Ranked #4 on Multimodal Machine Translation on Multi30K

Image Captioning Multimodal Machine Translation +4

Paper
Add Code

Findings of the 2016 WMT Shared Task on Cross-lingual Pronoun Prediction

no code implementations • WS 2016 • Liane Guillou, Christian Hardmeier, Preslav Nakov, Sara Stymne, Jörg Tiedemann, Yannick Versley, Mauro Cettolo, Bonnie Webber, Andrei Popescu-Belis

We describe the design, the evaluation setup, and the results of the 2016 WMT shared task on cross-lingual pronoun prediction.

Language Modelling POS

Paper
Add Code

Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations

1 code implementation • WS (NoDaLiDa) 2019 • Aarne Talman, Antti Suni, Hande Celikkanat, Sofoklis Kakouros, Jörg Tiedemann, Martti Vainio

In this paper we introduce a new natural language processing dataset and benchmark for predicting prosodic prominence from written text.

Ranked #1 on Prosody Prediction on Helsinki Prosody Corpus

Prosody Prediction

222

Paper
Code

The University of Helsinki submissions to the WMT19 news translation task

no code implementations • WS 2019 • Aarne Talman, Umut Sulubacak, Raúl Vázquez, Yves Scherrer, Sami Virpioja, Alessandro Raganato, Arvi Hurskainen, Jörg Tiedemann

In this paper, we present the University of Helsinki submissions to the WMT 2019 shared task on news translation in three language pairs: English-German, English-Finnish and Finnish-English.

Sentence Translation

Paper
Add Code

What do Language Representations Really Represent?

no code implementations • CL 2019 • Johannes Bjerva, Robert Östling, Maria Han Veiga, Jörg Tiedemann, Isabelle Augenstein

If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations.

Language Modelling Translation

Paper
Add Code

Multilingual NMT with a language-independent attention bridge

1 code implementation • WS 2019 • Raúl Vázquez, Alessandro Raganato, Jörg Tiedemann, Mathias Creutz

In this paper, we propose a multilingual encoder-decoder architecture capable of obtaining multilingual sentence representations by means of incorporating an intermediate {\em attention bridge} that is shared across all languages.

NMT Sentence +2

Paper
Code

The MeMAD Submission to the IWSLT 2018 Speech Translation Task

no code implementations • IWSLT (EMNLP) 2018 • Umut Sulubacak, Jörg Tiedemann, Aku Rouhe, Stig-Arne Grönroos, Mikko Kurimo

In this paper, we also describe the experiments leading up to our final systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

The MeMAD Submission to the WMT18 Multimodal Translation Task

no code implementations • WS 2018 • Stig-Arne Grönroos, Benoit Huet, Mikko Kurimo, Jorma Laaksonen, Bernard Merialdo, Phu Pham, Mats Sjöberg, Umut Sulubacak, Jörg Tiedemann, Raphael Troncy, Raúl Vázquez

Our experiments show that the effect of the visual features in our system is small.

Multimodal Machine Translation NMT +1

Paper
Add Code

Sentence Embeddings in NLI with Iterative Refinement Encoders

1 code implementation • 27 Aug 2018 • Aarne Talman, Anssi Yli-Jyrä, Jörg Tiedemann

We can show that the sentence embeddings learned in this way can be utilized in a wide variety of transfer learning tasks, outperforming InferSent on 7 out of 10 and SkipThought on 8 out of 9 SentEval sentence embedding evaluation tasks.

Ranked #5 on Natural Language Inference on SciTail

Natural Language Inference Sentence +3

Paper
Code

Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation Tasks

no code implementations • WS 2019 • Jörg Tiedemann, Yves Scherrer

In this paper, we investigate whether multilingual neural translation models learn stronger semantic abstractions of sentences than bilingual ones.

NMT Paraphrase Generation +1

Paper
Add Code

Emerging Language Spaces Learned From Massively Multilingual Corpora

no code implementations • 1 Feb 2018 • Jörg Tiedemann

Translations capture important information about languages that can be used as implicit supervision in learning linguistic properties and semantic representations.

Machine Translation Translation

Paper
Add Code

The Helsinki Neural Machine Translation System

1 code implementation • WS 2017 • Robert Östling, Yves Scherrer, Jörg Tiedemann, Gongbo Tang, Tommi Nieminen

We also discuss our submissions for English--Latvian, English--Chinese and Chinese--English.

Machine Translation NMT +1

Paper
Code

Neural Machine Translation with Extended Context

no code implementations • WS 2017 • Jörg Tiedemann, Yves Scherrer

We investigate the use of extended context in attention-based neural machine translation.

Machine Translation Translation

Paper
Add Code

Cross-Lingual Dependency Parsing for Closely Related Languages - Helsinki's Submission to VarDial 2017

no code implementations • 18 Aug 2017 • Jörg Tiedemann

This paper describes the submission from the University of Helsinki to the shared task on cross-lingual dependency parsing at VarDial 2017.

Dependency Parsing Translation

Paper
Add Code

Neural machine translation for low-resource languages

no code implementations • 18 Aug 2017 • Robert Östling, Jörg Tiedemann

Neural machine translation (NMT) approaches have improved the state of the art in many machine translation settings over the last couple of years, but they require large amounts of training data to produce sensible output.

Machine Translation NMT +2

Paper
Add Code

Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF

1 code implementation • IJCNLP 2017 • Yan Shao, Christian Hardmeier, Jörg Tiedemann, Joakim Nivre

We present a character-based model for joint segmentation and POS tagging for Chinese.

POS POS Tagging +1

151

Paper
Code

Continuous multilinguality with language vectors

no code implementations • 22 Dec 2016 • Robert Östling, Jörg Tiedemann

Most existing models for multilingual natural language processing (NLP) treat language as a discrete category, and make predictions for either one language or the other.

Language Modelling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.