Unsupervised Machine Translation

32 papers with code • 9 benchmarks • 4 datasets

Unsupervised machine translation is the task of doing machine translation without any translation resources at training time.

( Image credit: Phrase-Based & Neural Unsupervised Machine Translation )

Benchmarks

Add a Result

These leaderboards are used to track progress in Unsupervised Machine Translation

Dataset	Best Model	Compare
WMT2014 French-English	GPT-3 175B (Few-Shot)	See all
WMT2014 English-French	BERT-fused NMT	See all
WMT2016 English-German	GPT-3 175B (Few-Shot)	See all
WMT2016 German-English	GPT-3 175B (Few-Shot)	See all
WMT2016 English-Romanian	GPT-3 175B (Few-Shot)	See all
WMT2016 Romanian-English	GPT-3 175B (Few-Shot)	See all
WMT2014 English-German	SMT + NMT (tuning and joint refinement)	See all
WMT2014 German-English	SMT + NMT (tuning and joint refinement)	See all
WMT2016 English--Romanian	BERT-fused NMT	See all

Libraries

Use these libraries to find Unsupervised Machine Translation models and implementations

huggingface/transformers

3 papers

125,425

facebookresearch/MUSE

2 papers

3,167

facebookresearch/CodeGen

2 papers

675

Datasets

Most implemented papers

Most implemented Social Latest No code

Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions

uclanlp/visualbert • • NAACL 2021

Pre-trained contextual vision-and-language (V&L) models have achieved impressive performance on various benchmarks.

Paper
Code

The LMU Munich System for the WMT 2020 Unsupervised Machine Translation Shared Task

alexandra-chron/umt-lmu-wmt2020 • • WMT (EMNLP) 2020

Our core unsupervised neural machine translation (UNMT) system follows the strategy of Chronopoulou et al. (2020), using a monolingual pretrained language generation model (on German) and fine-tuning it on both German and Upper Sorbian, before initializing a UNMT model, which is trained with online backtranslation.

Paper
Code

Zero-Shot Language Transfer vs Iterative Back Translation for Unsupervised Machine Translation

chanzy3/11747_Final_Project • 31 Mar 2021

This work focuses on comparing different solutions for machine translation on low resource language pairs, namely, with zero-shot transfer learning and unsupervised machine translation.

Paper
Code

Break-It-Fix-It: Unsupervised Learning for Program Repair

michiyasunaga/bifi • • 11 Jun 2021

To bridge this gap, we propose a new training approach, Break-It-Fix-It (BIFI), which has two key ideas: (i) we use the critic to check a fixer's output on real bad inputs and add good (fixed) outputs to the training data, and (ii) we train a breaker to generate realistic bad code from good code.

Paper
Code

Unsupervised Translation of German--Lower Sorbian: Exploring Training and Novel Transfer Methods on a Low-Resource Language

leukas/wmt21 • • 24 Sep 2021

Lastly, we experiment with the order in which offline and online back-translation are used to train an unsupervised system, finding that using online back-translation first works better for DE$\rightarrow$DSB by 2. 76 BLEU.

Paper
Code

Leveraging Automated Unit Tests for Unsupervised Code Translation

facebookresearch/CodeGen • ICLR 2022

With little to no parallel data available for programming languages, unsupervised methods are well-suited to source code translation.

Paper
Code

Refining Low-Resource Unsupervised Translation by Language Disentanglement of Multilingual Model

nxphi47/refine_unsup_multilingual_mt • • 31 May 2022

Numerous recent work on unsupervised machine translation (UMT) implies that competent unsupervised translations of low-resource and unrelated languages, such as Nepali or Sinhala, are only possible if the model is trained in a massive multilingual environment, where these low-resource languages are mixed with high-resource counterparts.

Paper
Code

Unsupervised Mandarin-Cantonese Machine Translation

meganndare/cantonese-nlp • 10 Jan 2023

Advancements in unsupervised machine translation have enabled the development of machine translation systems that can translate between languages for which there is not an abundance of parallel data available.

Paper
Code

Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine Translation

google-research/url-nlp • 27 Mar 2023

Neural machine translation (NMT) has progressed rapidly over the past several years, and modern models are able to achieve relatively high quality using only monolingual text data, an approach dubbed Unsupervised Machine Translation (UNMT).

Paper
Code

Weakly-supervised Deep Cognate Detection Framework for Low-Resourced Languages Using Morphological Knowledge of Closely-Related Languages

koustavagoswami/weakly_supervised-cognate_detection • 9 Nov 2023

We train an encoder to gain morphological knowledge of a language and transfer the knowledge to perform unsupervised and weakly-supervised cognate detection tasks with and without the pivot language for the closely-related languages.

Paper
Code

Unsupervised Machine Translation

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result