Multimodal Machine Translation

34 papers with code • 3 benchmarks • 5 datasets

Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.

( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )

Libraries

Use these libraries to find Multimodal Machine Translation models and implementations

Seamless: Multilingual Expressive and Streaming Speech Translation

facebookresearch/seamless_communication 8 Dec 2023

In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion.

9,982
08 Dec 2023

Video-Helpful Multimodal Machine Translation

ku-nlp/video-helpful-mmt 31 Oct 2023

In addition to the extensive training set, EVA contains a video-helpful evaluation set in which subtitles are ambiguous, and videos are guaranteed helpful for disambiguation.

2
31 Oct 2023

Incorporating Probing Signals into Multimodal Machine Translation via Visual Question-Answering Pairs

libeineu/mmt-vqa 26 Oct 2023

This paper presents an in-depth study of multimodal machine translation (MMT), examining the prevailing understanding that MMT systems exhibit decreased sensitivity to visual information when text inputs are complete.

4
26 Oct 2023

Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation

ictnlp/sammt 20 Oct 2023

Multimodal machine translation (MMT) simultaneously takes the source sentence and a relevant image as input for translation.

3
20 Oct 2023

CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation

devaansh100/cliptrans ICCV 2023

Simultaneously, there has been an influx of multilingual pre-trained models for NMT and multimodal pre-trained models for vision-language tasks, primarily in English, which have shown exceptional generalisation ability.

16
29 Aug 2023

BigVideo: A Large-scale Video Subtitle Translation Dataset for Multimodal Machine Translation

deeplearnxmu/bigvideo-vmt 23 May 2023

We also introduce two deliberately designed test sets to verify the necessity of visual information: Ambiguous with the presence of ambiguous words, and Unambiguous in which the text context is self-contained for translation.

11
23 May 2023

Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination

scofield7419/ummt-vsh 20 May 2023

In this work, we investigate a more realistic unsupervised multimodal machine translation (UMMT) setup, inference-time image-free UMMT, where the model is trained with source-text image pairs, and tested with only source-text inputs.

7
20 May 2023

Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation

matthieufp/vgamt 20 Dec 2022

One of the major challenges of machine translation (MT) is ambiguity, which can in some cases be resolved by accompanying context such as images.

9
20 Dec 2022

Distill the Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation

pengr/ikd-mmt 10 Oct 2022

Thus, in this work, we introduce IKD-MMT, a novel MMT framework to support the image-free inference phase via an inversion knowledge distillation scheme.

30
10 Oct 2022

VALHALLA: Visual Hallucination for Machine Translation

jerryyli/valhalla-nmt CVPR 2022

In particular, given a source sentence an autoregressive hallucination transformer is used to predict a discrete visual representation from the input text, and the combined text and hallucinated representations are utilized to obtain the target translation.

26
31 May 2022