Multimodal Machine Translation

34 papers with code • 3 benchmarks • 5 datasets

Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.

( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )

Benchmarks

Add a Result

These leaderboards are used to track progress in Multimodal Machine Translation

Dataset	Best Model	Compare
Multi30K	ERNIE-UniX2	See all
Hindi Visual Genome (Test Set)	ViTA	See all
Hindi Visual Genome (Challenge Set)	ViTA	See all

Libraries

Use these libraries to find Multimodal Machine Translation models and implementations

facebookresearch/seamless_communica…

2 papers

10,177

lium-lst/nmtpy

2 papers

126

Datasets

Subtasks

Latest papers with no code

Most implemented Social Latest No code

Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets

no code yet • 9 Apr 2024

Recent research in the field of multimodal machine translation (MMT) has indicated that the visual modality is either dispensable or offers only marginal advantages.

Paper
Add Code

Detecting Concrete Visual Tokens for Multimodal Machine Translation

no code yet • 5 Mar 2024

The challenge of visual grounding and masking in multimodal machine translation (MMT) systems has encouraged varying approaches to the detection and selection of visually-grounded text tokens for masking.

Paper
Add Code

Adding Multimodal Capabilities to a Text-only Translation Model

no code yet • 5 Mar 2024

While most current work in multimodal machine translation (MMT) uses the Multi30k dataset for training and evaluation, we find that the resulting models overfit to the Multi30k dataset to an extreme degree.

Paper
Add Code

The Case for Evaluating Multimodal Translation Models on Text Datasets

no code yet • 5 Mar 2024

Therefore, we propose that MMT models be evaluated using 1) the CoMMuTE evaluation framework, which measures the use of visual information by MMT models, 2) the text-only WMT news translation task test sets, which evaluates translation performance against complex sentences, and 3) the Multi30k test sets, for measuring MMT model performance against a real MMT dataset.

Paper
Add Code

A Survey of Vision-Language Pre-training from the Lens of Multimodal Machine Translation

no code yet • 12 Jun 2023

Large language models such as BERT and the GPT series started a paradigm shift that calls for building general-purpose models via pre-training on large datasets, followed by fine-tuning on task-specific datasets.

Paper
Add Code

HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language

no code yet • 28 May 2023

This paper presents HaVQA, the first multimodal dataset for visual question-answering (VQA) tasks in the Hausa language.

Paper
Add Code

Iterative Adversarial Attack on Image-guided Story Ending Generation

no code yet • 16 May 2023

Multimodal learning involves developing models that can integrate information from various sources like images and texts.

Paper
Add Code

Generalization algorithm of multimodal pre-training model based on graph-text self-supervised training

no code yet • 16 Feb 2023

In this paper, a multimodal pre-training generalization algorithm for self-supervised training is proposed, which overcomes the lack of visual information and inaccuracy, and thus extends the applicability of images on NMT.

Paper
Add Code

Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation

no code yet • 20 Dec 2022

Therefore, this paper correspondingly establishes new methods and new datasets for MMT.

Paper
Add Code

ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation

no code yet • 9 Nov 2022

Recent cross-lingual cross-modal works attempt to extend Vision-Language Pre-training (VLP) models to non-English inputs and achieve impressive performance.

Paper
Add Code

Multimodal Machine Translation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result