Multimodal Machine Translation

34 papers with code • 3 benchmarks • 5 datasets

Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.

( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )

Libraries

Use these libraries to find Multimodal Machine Translation models and implementations

Neural Machine Translation with Phrase-Level Universal Visual Representations

ictnlp/pluvr ACL 2022

Multimodal machine translation (MMT) aims to improve neural machine translation (NMT) with additional visual information, but most existing MMT methods require paired input of source sentence and image, which makes them suffer from shortage of sentence-image pairs.

20
19 Mar 2022

On Vision Features in Multimodal Machine Translation

libeineu/fairseq_mmt ACL 2022

Previous work on multimodal machine translation (MMT) has focused on the way of incorporating vision features into translation but little attention is on the quality of vision models.

41
17 Mar 2022

MSCTD: A Multimodal Sentiment Chat Translation Dataset

xl2248/msctd ACL 2022

In this work, we introduce a new task named Multimodal Chat Translation (MCT), aiming to generate more accurate translations with the help of the associated dialogue history and visual context.

40
28 Feb 2022

VISA: An Ambiguous Subtitles Dataset for Visual Scene-Aware Machine Translation

ku-nlp/visa LREC 2022

Existing multimodal machine translation (MMT) datasets consist of images and video captions or general subtitles, which rarely contain linguistic ambiguity, making visual information not so effective to generate appropriate translations.

9
20 Jan 2022

Vision Matters When It Should: Sanity Checking Multimodal Machine Translation Models

jiaodali/vision-matters-when-it-should EMNLP 2021

Multimodal machine translation (MMT) systems have been shown to outperform their text-only neural machine translation (NMT) counterparts when visual context is available.

5
08 Sep 2021

BERTGEN: Multi-task Generation through BERT

ImperialNLP/BertGen ACL 2021

We present BERTGEN, a novel generative, decoder-only model which extends BERT by fusing multimodal and multilingual pretrained models VL-BERT and M-BERT, respectively.

10
07 Jun 2021

ViTA: Visual-Linguistic Translation by Aligning Object Tags

kshitij98/vita Workshop on Asian Translation 2021

Multimodal Machine Translation (MMT) enriches the source text with visual information for translation.

3
01 Jun 2021

Cultural and Geographical Influences on Image Translatability of Words across Languages

nikzadkhani/MMID-CNN-Analysis NAACL 2021

We find that images of words are not always invariant across languages, and that language pairs with shared culture, meaning having either a common language family, ethnicity or religion, have improved image translatability (i. e., have more similar images for similar words) compared to its converse, regardless of their geographic proximity.

1
01 Jun 2021

Cross-lingual Visual Pre-training for Multimodal Machine Translation

imperialnlp/vtlm EACL 2021

Pre-trained language models have been shown to improve performance in many natural language tasks substantially.

16
25 Jan 2021

Dynamic Context-guided Capsule Network for Multimodal Machine Translation

DeepLearnXMU/MM-DCCN 4 Sep 2020

Particularly, we represent the input image with global and regional visual features, we introduce two parallel DCCNs to model multimodal context vectors with visual features at different granularities.

40
04 Sep 2020