no code implementations • RepL4NLP (ACL) 2022 • Adnen Abdessaied, Ekta Sood, Andreas Bulling
We propose the Video Language Co-Attention Network (VLCN) – a novel memory-enhanced model for Video Question Answering (VideoQA).
no code implementations • 30 Apr 2022 • Ahmed Abdou, Ekta Sood, Philipp Müller, Andreas Bulling
Emotional expressions are inherently multimodal -- integrating facial behavior, speech, and gaze -- but their automatic recognition is often limited to a single modality, e. g. speech during a phone call.
no code implementations • 27 Sep 2021 • Ekta Sood, Fabian Kögel, Philipp Müller, Dominike Thomas, Mihai Bace, Andreas Bulling
We present the Multimodal Human-like Attention Network (MULAN) - the first method for multimodal integration of human-like attention on image and text during training of VQA models.
no code implementations • CoNLL (EMNLP) 2021 • Ekta Sood, Fabian Kögel, Florian Strohm, Prajit Dhar, Andreas Bulling
We present VQA-MHUG - a novel 49-participant dataset of multimodal human gaze on both images and questions during visual question answering (VQA) collected using a high-speed eye tracker.
no code implementations • ICCV 2021 • Florian Strohm, Ekta Sood, Sven Mayer, Philipp Müller, Mihai Bâce, Andreas Bulling
The encoder extracts image features and predicts a neural activation map for each face looked at by a human observer.
no code implementations • NeurIPS 2020 • Ekta Sood, Simon Tannert, Philipp Mueller, Andreas Bulling
A lack of corpora has so far limited advances in integrating human gaze data as a supervisory signal in neural attention mechanisms for natural language processing(NLP).
no code implementations • CONLL 2020 • Ekta Sood, Simon Tannert, Diego Frassinelli, Andreas Bulling, Ngoc Thang Vu
We compare state of the art networks based on long short-term memory (LSTM), convolutional neural models (CNN) and XLNet Transformer architectures.
1 code implementation • CONLL 2018 • Matthias Blohm, Glorianna Jagfeld, Ekta Sood, Xiang Yu, Ngoc Thang Vu
We propose a machine reading comprehension model based on the compare-aggregate framework with two-staged attention that achieves state-of-the-art results on the MovieQA question answering dataset.