Good for Misconceived Reasons: Revisiting Neural Multimodal Machine Translation

1 Jan 2021  ·  Zhiyong Wu, Lingpeng Kong, Ben Kao ·

A neural multimodal machine translation (MMT) system is one that aims to perform better translation by extending conventional text-only translation models with multimodal information. Many recent studies report improvements when equipping their models with the multimodal module, despite the controversy whether such improvements indeed come from the multimodal part. We revisit the recent development of neural multimodal machine translation by proposing two \textit{interpretable} MMT models that achieve new state-of-the-art results on the standard \dataset\ dataset. To our surprise, however, while we observe similar gains as in the recent developed multimodal-integrated models, our models learn to \textit{ignore} the multimodal information. Upon further investigation, we discover that the improvements bought about by the multimodal models over text-only counterpart are in fact results of the regularization effect. We report our empirical findings which express the importance of MMT models' interpretability and set new paradigms for future MMT research.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods