1 code implementation • 2 Jun 2023 • Fabian Kögel, Bac Nguyen, Fabien Cardinaux
State-of-the-art non-autoregressive text-to-speech (TTS) models based on FastSpeech 2 can efficiently synthesise high-fidelity and natural speech.
no code implementations • 27 Sep 2021 • Ekta Sood, Fabian Kögel, Philipp Müller, Dominike Thomas, Mihai Bace, Andreas Bulling
We present the Multimodal Human-like Attention Network (MULAN) - the first method for multimodal integration of human-like attention on image and text during training of VQA models.
no code implementations • CoNLL (EMNLP) 2021 • Ekta Sood, Fabian Kögel, Florian Strohm, Prajit Dhar, Andreas Bulling
We present VQA-MHUG - a novel 49-participant dataset of multimodal human gaze on both images and questions during visual question answering (VQA) collected using a high-speed eye tracker.