no code implementations • 27 Sep 2021 • Ekta Sood, Fabian Kögel, Philipp Müller, Dominike Thomas, Mihai Bace, Andreas Bulling
We present the Multimodal Human-like Attention Network (MULAN) - the first method for multimodal integration of human-like attention on image and text during training of VQA models.