Multimodal Machine Translation

34 papers with code • 3 benchmarks • 5 datasets

Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.

( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )

Benchmarks

Add a Result

These leaderboards are used to track progress in Multimodal Machine Translation

Dataset	Best Model	Compare
Multi30K	ERNIE-UniX2	See all
Hindi Visual Genome (Test Set)	ViTA	See all
Hindi Visual Genome (Challenge Set)	ViTA	See all

Libraries

Use these libraries to find Multimodal Machine Translation models and implementations

facebookresearch/seamless_communica…

2 papers

10,202

lium-lst/nmtpy

2 papers

126

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

Neural Machine Translation with Phrase-Level Universal Visual Representations

ictnlp/pluvr • • ACL 2022

Multimodal machine translation (MMT) aims to improve neural machine translation (NMT) with additional visual information, but most existing MMT methods require paired input of source sentence and image, which makes them suffer from shortage of sentence-image pairs.

19 Mar 2022

Paper
Code

On Vision Features in Multimodal Machine Translation

libeineu/fairseq_mmt • • ACL 2022

Previous work on multimodal machine translation (MMT) has focused on the way of incorporating vision features into translation but little attention is on the quality of vision models.

17 Mar 2022

Paper
Code

MSCTD: A Multimodal Sentiment Chat Translation Dataset

xl2248/msctd • • ACL 2022

In this work, we introduce a new task named Multimodal Chat Translation (MCT), aiming to generate more accurate translations with the help of the associated dialogue history and visual context.

28 Feb 2022

Paper
Code

VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine Translation

ku-nlp/visa • LREC 2022

Existing multimodal machine translation (MMT) datasets consist of images and video captions or general subtitles, which rarely contain linguistic ambiguity, making visual information not so effective to generate appropriate translations.

20 Jan 2022

Paper
Code

Vision Matters When It Should: Sanity Checking Multimodal Machine Translation Models

jiaodali/vision-matters-when-it-should • • EMNLP 2021

Multimodal machine translation (MMT) systems have been shown to outperform their text-only neural machine translation (NMT) counterparts when visual context is available.

08 Sep 2021

Paper
Code

BERTGEN: Multi-task Generation through BERT

ImperialNLP/BertGen • • ACL 2021

We present BERTGEN, a novel generative, decoder-only model which extends BERT by fusing multimodal and multilingual pretrained models VL-BERT and M-BERT, respectively.

07 Jun 2021

Paper
Code

ViTA: Visual-Linguistic Translation by Aligning Object Tags

kshitij98/vita • Workshop on Asian Translation 2021

Multimodal Machine Translation (MMT) enriches the source text with visual information for translation.

01 Jun 2021

Paper
Code

Cultural and Geographical Influences on Image Translatability of Words across Languages

nikzadkhani/MMID-CNN-Analysis • NAACL 2021

We find that images of words are not always invariant across languages, and that language pairs with shared culture, meaning having either a common language family, ethnicity or religion, have improved image translatability (i. e., have more similar images for similar words) compared to its converse, regardless of their geographic proximity.

01 Jun 2021

Paper
Code

Cross-lingual Visual Pre-training for Multimodal Machine Translation

imperialnlp/vtlm • • EACL 2021

Pre-trained language models have been shown to improve performance in many natural language tasks substantially.

25 Jan 2021

Paper
Code

Dynamic Context-guided Capsule Network for Multimodal Machine Translation

DeepLearnXMU/MM-DCCN • • 4 Sep 2020

Particularly, we represent the input image with global and regional visual features, we introduce two parallel DCCNs to model multimodal context vectors with visual features at different granularities.

04 Sep 2020

Paper
Code

Multimodal Machine Translation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result