Search Results for author: Alexander Fraser

Found 50 papers, 15 papers with code

Findings of the WMT 2021 Shared Tasks in Unsupervised MT and Very Low Resource Supervised MT

no code implementations • WMT (EMNLP) 2021 • Jindřich Libovický, Alexander Fraser

We present the findings of the WMT2021 Shared Tasks in Unsupervised MT and Very Low Resource Supervised MT.

Paper
Add Code

Don’t Forget Cheap Training Signals Before Building Unsupervised Bilingual Word Embeddings

no code implementations • LREC (BUCC) 2022 • Silvia Severini, Viktor Hangya, Masoud Jalili Sabet, Alexander Fraser, Hinrich Schütze

The two approaches we find most effective are: 1) using identical words as seed lexicons (which unsupervised approaches incorrectly assume are not available for orthographically distinct language pairs) and 2) combining such lexicons with pairs extracted by matching romanized versions of words with an edit distance threshold.

Cross-Lingual Transfer Word Embeddings

Paper
Add Code

Why don’t people use character-level machine translation?

no code implementations • Findings (ACL) 2022 • Jindřich Libovický, Helmut Schmid, Alexander Fraser

We present a literature and empirical survey that critically assesses the state of the art in character-level modeling for machine translation (MT).

Machine Translation Translation

Paper
Add Code

The LMU Munich Systems for the WMT21 Unsupervised and Very Low-Resource Translation Task

no code implementations • WMT (EMNLP) 2021 • Jindřich Libovický, Alexander Fraser

We present our submissions to the WMT21 shared task in Unsupervised and Very Low Resource machine translation between German and Upper Sorbian, German and Lower Sorbian, and Russian and Chuvash.

Machine Translation Translation

Paper
Add Code

Cross-Lingual Transfer Learning for Hate Speech Detection

no code implementations • EACL (LTEDI) 2021 • Irina Bigoulaeva, Viktor Hangya, Alexander Fraser

Rather than collecting and annotating new hate speech data, we show how to use cross-lingual transfer learning to leverage already existing data from higher-resource languages.

Cross-Lingual Transfer Hate Speech Detection +2

Paper
Add Code

Towards Handling Compositionality in Low-Resource Bilingual Word Induction

no code implementations • AMTA 2020 • Viktor Hangya, Alexander Fraser

Paper
Add Code

Adapting Entities across Languages and Cultures

no code implementations • Findings (EMNLP) 2021 • Denis Peskov, Viktor Hangya, Jordan Boyd-Graber, Alexander Fraser

He is associated with founding a company in the United States, so perhaps the German founder Carl Benz could stand in for Gates in those contexts.

Machine Translation Question Answering +1

Paper
Add Code

The LMU Munich System for the WMT20 Very Low Resource Supervised MT Task

no code implementations • WMT (EMNLP) 2020 • Jindřich Libovický, Viktor Hangya, Helmut Schmid, Alexander Fraser

We present our systems for the WMT20 Very Low Resource MT Task for translation between German and Upper Sorbian.

Transfer Learning Translation

Paper
Add Code

The LMU Munich System for the WMT 2021 Large-Scale Multilingual Machine Translation Shared Task

no code implementations • WMT (EMNLP) 2021 • Wen Lai, Jindřich Libovický, Alexander Fraser

This paper describes the submission of LMU Munich to the WMT 2021 multilingual machine translation task for small track #1, which studies translation between 6 languages (Croatian, Hungarian, Estonian, Serbian, Macedonian, English) in 30 directions.

Data Augmentation Knowledge Distillation +2

Paper
Add Code

Improving Machine Translation of Rare and Unseen Word Senses

no code implementations • WMT (EMNLP) 2021 • Viktor Hangya, Qianchu Liu, Dario Stojanovski, Alexander Fraser, Anna Korhonen

The performance of NMT systems has improved drastically in the past few years but the translation of multi-sense words still poses a challenge.

Bilingual Lexicon Induction NMT +3

Paper
Add Code

Do not neglect related languages: The case of low-resource Occitan cross-lingual word embeddings

no code implementations • EMNLP (MRL) 2021 • Lisa Woller, Viktor Hangya, Alexander Fraser

In contrast to previous approaches which leverage independently pre-trained embeddings of languages, we (i) train CLWEs for the low-resource and a related language jointly and (ii) map them to the target language to build the final multilingual space.

Bilingual Lexicon Induction Cross-Lingual Word Embeddings +1

Paper
Add Code

Findings of the WMT 2020 Shared Tasks in Unsupervised MT and Very Low Resource Supervised MT

no code implementations • WMT (EMNLP) 2020 • Alexander Fraser

We describe the WMT 2020 Shared Tasks in Unsupervised MT and Very Low Resource Supervised MT.

Machine Translation

Paper
Add Code

Unsupervised Parallel Sentence Extraction from Comparable Corpora

no code implementations • IWSLT (EMNLP) 2018 • Viktor Hangya, Fabienne Braune, Yuliya Kalasouskaya, Alexander Fraser

We show that our approach is effective, on three language-pairs, without the use of any bilingual signal which is important because parallel sentence mining is most useful in low resource scenarios.

Sentence Word Embeddings

Paper
Add Code

Labeled Morphological Segmentation with Semi-Markov Models

no code implementations • CONLL 2015 • Ryan Cotterell, Thomas Müller, Alexander Fraser, Hinrich Schütze

We present labeled morphological segmentation, an alternative view of morphological processing that unifies several tasks.

Segmentation TAG

Paper
Add Code

Understanding Cross-Lingual Alignment -- A Survey

no code implementations • 9 Apr 2024 • Katharina Hämmerl, Jindřich Libovický, Alexander Fraser

Cross-lingual alignment, the meaningful similarity of representations across languages in multilingual language models, has been an active field of research in recent years.

Paper
Add Code

Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You

1 code implementation • 29 Jan 2024 • Felix Friedrich, Katharina Hämmerl, Patrick Schramowski, Jindrich Libovicky, Kristian Kersting, Alexander Fraser

Text-to-image generation models have recently achieved astonishing results in image quality, flexibility, and text alignment and are consequently employed in a fast-growing number of applications.

Multilingual Text-to-Image Generation Prompt Engineering +1

Paper
Code

Multilingual Word Embeddings for Low-Resource Languages using Anchors and a Chain of Related Languages

no code implementations • 21 Nov 2023 • Viktor Hangya, Silvia Severini, Radoslav Ralev, Alexander Fraser, Hinrich Schütze

In this paper, we propose to build multilingual word embeddings (MWEs) via a novel language chain-based approach, that incorporates intermediate related languages to bridge the gap between the distant source and target.

Bilingual Lexicon Induction Multilingual NLP +1

Paper
Add Code

Extending Multilingual Machine Translation through Imitation Learning

no code implementations • 14 Nov 2023 • Wen Lai, Viktor Hangya, Alexander Fraser

Despite the growing variety of languages supported by existing multilingual neural machine translation (MNMT) models, most of the world's languages are still being left behind.

Imitation Learning Machine Translation +1

Paper
Add Code

Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity

1 code implementation • 1 Jun 2023 • Katharina Hämmerl, Alina Fastowski, Jindřich Libovický, Alexander Fraser

We investigate outlier dimensions and their relationship to anisotropy in multiple pre-trained multilingual language models.

Paper
Code

On the Copying Problem of Unsupervised NMT: A Training Schedule with a Language Discriminator Loss

1 code implementation • 26 May 2023 • Yihong Liu, Alexandra Chronopoulou, Hinrich Schütze, Alexander Fraser

By conducting extensive experiments on different language pairs, including similar and distant, high and low-resource languages, we find that our method alleviates the copying problem, thus improving the translation performance on low-resource languages.

Machine Translation NMT +2

Paper
Code

How to Solve Few-Shot Abusive Content Detection Using the Data We Actually Have

no code implementations • 23 May 2023 • Viktor Hangya, Alexander Fraser

Our analysis also shows that our models acquire a general understanding of abusive language, since they improve the prediction of labels which are present only in the target dataset.

Abusive Language

Paper
Add Code

Mitigating Data Imbalance and Representation Degeneration in Multilingual Machine Translation

1 code implementation • 22 May 2023 • Wen Lai, Alexandra Chronopoulou, Alexander Fraser

Despite advances in multilingual neural machine translation (MNMT), we argue that there are still two major challenges in this area: data imbalance and representation degeneration.

Contrastive Learning Machine Translation +1

Paper
Code

AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models

no code implementations • 14 Feb 2023 • Alexandra Chronopoulou, Matthew E. Peters, Alexander Fraser, Jesse Dodge

We also explore weight averaging of adapters trained on the same domain with different hyper-parameters, and show that it preserves the performance of a PLM on new domains while obtaining strong in-domain results.

Clustering Language Modelling +3

Paper
Add Code

Speaking Multiple Languages Affects the Moral Bias of Language Models

1 code implementation • 14 Nov 2022 • Katharina Hämmerl, Björn Deiseroth, Patrick Schramowski, Jindřich Libovický, Constantin A. Rothkopf, Alexander Fraser, Kristian Kersting

Do the models capture moral norms from English and impose them on other languages?

Cross-Lingual Transfer

Paper
Code

$m^4Adapter$: Multilingual Multi-Domain Adaptation for Machine Translation with a Meta-Adapter

1 code implementation • 21 Oct 2022 • Wen Lai, Alexandra Chronopoulou, Alexander Fraser

We consider a very challenging scenario: adapting the MNMT model both to a new domain and to a new language pair at the same time.

Domain Adaptation Machine Translation +2

Paper
Code

A Survey of Methods for Addressing Class Imbalance in Deep-Learning Based Natural Language Processing

no code implementations • 10 Oct 2022 • Sophie Henning, William Beluch, Alexander Fraser, Annemarie Friedrich

With this survey, the first overview on class imbalance in deep-learning based NLP, we provide guidance for NLP researchers and practitioners dealing with imbalanced data.

Benchmarking Data Augmentation

Paper
Add Code

Language-Family Adapters for Low-Resource Multilingual Neural Machine Translation

no code implementations • 30 Sep 2022 • Alexandra Chronopoulou, Dario Stojanovski, Alexander Fraser

Training a new adapter on each language pair or training a single adapter on all language pairs without updating the pretrained model has been proposed as a parameter-efficient alternative.

Cross-Lingual Transfer Machine Translation +1

Paper
Add Code

Don't Forget Cheap Training Signals Before Building Unsupervised Bilingual Word Embeddings

no code implementations • 31 May 2022 • Silvia Severini, Viktor Hangya, Masoud Jalili Sabet, Alexander Fraser, Hinrich Schütze

Cross-Lingual Transfer Word Embeddings

Paper
Add Code

Demonstrating CAT: Synthesizing Data-Aware Conversational Agents for Transactional Databases

no code implementations • 26 Mar 2022 • Marius Gassen, Benjamin Hättasch, Benjamin Hilprecht, Nadja Geisler, Alexander Fraser, Carsten Binnig

However, developing a conversational agent (i. e., a chatbot-like interface) to allow end-users to interact with an application using natural language requires both immense amounts of training data and NLP expertise.

Chatbot

Paper
Add Code

Modeling Target-Side Morphology in Neural Machine Translation: A Comparison of Strategies

no code implementations • 25 Mar 2022 • Marion Weller-Di Marco, Matthias Huck, Alexander Fraser

Key challenges of rich target-side morphology in data-driven machine translation include: (1) A large amount of differently inflected word surface forms entails a larger vocabulary and thus data sparsity.

LEMMA Machine Translation +3

Paper
Add Code

Do Multilingual Language Models Capture Differing Moral Norms?

no code implementations • 18 Mar 2022 • Katharina Hämmerl, Björn Deiseroth, Patrick Schramowski, Jindřich Libovický, Alexander Fraser, Kristian Kersting

Massively multilingual sentence representations are trained on large corpora of uncurated data, with a very imbalanced proportion of languages included in the training.

Sentence XLM-R

Paper
Add Code

Combining Static and Contextualised Multilingual Embeddings

1 code implementation • Findings (ACL) 2022 • Katharina Hämmerl, Jindřich Libovický, Alexander Fraser

We combine the strengths of static and contextual models to improve multilingual representations.

Retrieval XLM-R

Paper
Code

Addressing the Challenges of Cross-Lingual Hate Speech Detection

no code implementations • 15 Jan 2022 • Irina Bigoulaeva, Viktor Hangya, Iryna Gurevych, Alexander Fraser

The goal of hate speech detection is to filter negative online content aiming at certain groups of people.

Cross-Lingual Transfer Cross-Lingual Word Embeddings +3

Paper
Add Code

Improving Both Domain Robustness and Domain Adaptability in Machine Translation

1 code implementation • COLING 2022 • Wen Lai, Jindřich Libovický, Alexander Fraser

First, we want to reach domain robustness, i. e., we want to reach high quality on both domains seen in the training data and unseen domains.

Domain Adaptation Machine Translation +3

Paper
Code

Why don't people use character-level machine translation?

no code implementations • 15 Oct 2021 • Jindřich Libovický, Helmut Schmid, Alexander Fraser

We present a literature and empirical survey that critically assesses the state of the art in character-level modeling for machine translation (MT).

Machine Translation Translation

Paper
Add Code

Neural String Edit Distance

1 code implementation • spnlp (ACL) 2022 • Jindřich Libovický, Alexander Fraser

We propose the neural string edit distance model for string-pair matching and string transduction based on learnable string edit distance.

Classification General Classification +1

Paper
Code

Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Translation

1 code implementation • NAACL 2021 • Alexandra Chronopoulou, Dario Stojanovski, Alexander Fraser

Successful methods for unsupervised neural machine translation (UNMT) employ crosslingual pretraining via self-supervision, often in the form of a masked language modeling or a sequence generation task, which requires the model to align the lexical- and high-level representations of the two languages.

Bilingual Lexicon Induction Language Modelling +2

Paper
Code

ContraCAT: Contrastive Coreference Analytical Templates for Machine Translation

no code implementations • COLING 2020 • Dario Stojanovski, Benno Krojer, Denis Peskov, Alexander Fraser

Recent high scores on pronoun translation using context-aware neural machine translation have suggested that current approaches work well.

Machine Translation NMT +1

Paper
Add Code

Combining Word Embeddings with Bilingual Orthography Embeddings for Bilingual Dictionary Induction

no code implementations • COLING 2020 • Silvia Severini, Viktor Hangya, Alexander Fraser, Hinrich Sch{\"u}tze

In this paper, we enrich BWE-based BDI with transliteration information by using Bilingual Orthography Embeddings (BOEs).

Translation Transliteration +1

Paper
Add Code

The LMU Munich System for the WMT 2020 Unsupervised Machine Translation Shared Task

1 code implementation • WMT (EMNLP) 2020 • Alexandra Chronopoulou, Dario Stojanovski, Viktor Hangya, Alexander Fraser

Our core unsupervised neural machine translation (UNMT) system follows the strategy of Chronopoulou et al. (2020), using a monolingual pretrained language generation model (on German) and fine-tuning it on both German and Upper Sorbian, before initializing a UNMT model, which is trained with online backtranslation.

Text Generation Translation +1

Paper
Code

Anchor-based Bilingual Word Embeddings for Low-Resource Languages

no code implementations • ACL 2021 • Tobias Eder, Viktor Hangya, Alexander Fraser

For low resource languages training MWEs monolingually results in MWEs of poor quality, and thus poor bilingual word embeddings (BWEs) as well.

Bilingual Lexicon Induction Cross-Lingual Transfer +5

Paper
Add Code

Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT

1 code implementation • EMNLP 2020 • Alexandra Chronopoulou, Dario Stojanovski, Alexander Fraser

Using a language model (LM) pretrained on two languages with large monolingual data in order to initialize an unsupervised neural machine translation (UNMT) system yields state-of-the-art results.

Language Modelling Machine Translation +2

Paper
Code

Pragmatic information in translation: a corpus-based study of tense and mood in English and German

no code implementations • 10 Jul 2020 • Anita Ramm, Ekaterina Lapshinova-Koltunski, Alexander Fraser

Grammatical tense and mood are important linguistic phenomena to consider in natural language processing (NLP) research.

Machine Translation Multilingual NLP +1

Paper
Add Code

Addressing Zero-Resource Domains Using Document-Level Context in Neural Machine Translation

no code implementations • EACL (AdaptNLP) 2021 • Dario Stojanovski, Alexander Fraser

Achieving satisfying performance in machine translation on domains for which there is no training data is challenging.

Domain Adaptation Machine Translation +2

Paper
Add Code

Towards Reasonably-Sized Character-Level Transformer NMT by Finetuning Subword Systems

2 code implementations • EMNLP 2020 • Jindřich Libovický, Alexander Fraser

Applying the Transformer architecture on the character level usually requires very deep architectures that are difficult and slow to train.

Machine Translation NMT +2

Paper
Code

On the Language Neutrality of Pre-trained Multilingual Representations

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Jindřich Libovický, Rudolf Rosa, Alexander Fraser

Multilingual contextual embeddings, such as multilingual BERT and XLM-RoBERTa, have proved useful for many multi-lingual tasks.

Language Identification Transfer Learning +1

Paper
Code

How Language-Neutral is Multilingual BERT?

1 code implementation • 8 Nov 2019 • Jindřich Libovický, Rudolf Rosa, Alexander Fraser

Multilingual BERT (mBERT) provides sentence representations for 104 languages, which are useful for many multi-lingual tasks.

Retrieval Sentence +2

Paper
Code

Embedding Learning Through Multilingual Concept Induction

no code implementations • ACL 2018 • Philipp Dufter, Mengjie Zhao, Martin Schmitt, Alexander Fraser, Hinrich Schütze

We present a new method for estimating vector space representations of words: embedding learning by concept induction.

Sentiment Analysis Word Similarity

Paper
Add Code

Modeling Target-Side Inflection in Neural Machine Translation

no code implementations • WS 2017 • Aleš Tamchyna, Marion Weller-Di Marco, Alexander Fraser

NMT systems have problems with large vocabulary sizes.

LEMMA Machine Translation +4

Paper
Add Code

Target-Side Context for Discriminative Models in Statistical Machine Translation

no code implementations • ACL 2016 • Aleš Tamchyna, Alexander Fraser, Ondřej Bojar, Marcin Junczys-Dowmunt

Discriminative translation models utilizing source context have been shown to help statistical machine translation performance.

Machine Translation Translation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.