Multimodal Machine Translation
35 papers with code • 3 benchmarks • 5 datasets
Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating "a bird is flying over water" + an image of a bird over water to German text.
( Image credit: Findings of the Third Shared Task on Multimodal Machine Translation )
Libraries
Use these libraries to find Multimodal Machine Translation models and implementationsLatest papers
Dynamic Context-guided Capsule Network for Multimodal Machine Translation
Particularly, we represent the input image with global and regional visual features, we introduce two parallel DCCNs to model multimodal context vectors with visual features at different granularities.
Multimodal Transformer for Multimodal Machine Translation
Multimodal Machine Translation (MMT) aims to introduce information from other modality, generally static images, to improve the translation quality.
Self-Knowledge Distillation with Progressive Refinement of Targets
Hence, it can be interpreted within a framework of knowledge distillation as a student becomes a teacher itself.
M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training
We present M3P, a Multitask Multilingual Multimodal Pre-trained model that combines multilingual pre-training and multimodal pre-training into a unified framework via multitask pre-training.
Distilling Translations with Visual Awareness
Previous work on multimodal machine translation has shown that visual information is only needed in very specific cases, for example in the presence of ambiguous words where the textual context is not sufficient.
Multimodal Machine Translation with Embedding Prediction
Multimodal machine translation is an attractive application of neural machine translation (NMT).
Latent Variable Model for Multi-modal Translation
In this work, we propose to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model.
UMONS Submission for WMT18 Multimodal Translation Task
This paper describes the UMONS solution for the Multimodal Machine Translation Task presented at the third conference on machine translation (WMT18).
Findings of the Third Shared Task on Multimodal Machine Translation
In this task a source sentence in English is supplemented by an image and participating systems are required to generate a translation for such a sentence into German, French or Czech.
A Visual Attention Grounding Neural Model for Multimodal Machine Translation
The model leverages a visual attention grounding mechanism that links the visual semantics with the corresponding textual semantics.