no code implementations • SIGDIAL (ACL) 2022 • Alessandro Suglia, Bhathiya Hemanthage, Malvina Nikandrou, George Pantazopoulos, Amit Parekh, Arash Eshghi, Claudio Greco, Ioannis Konstas, Oliver Lemon, Verena Rieser
We demonstrate EMMA, an embodied multimodal agent which has been developed for the Alexa Prize SimBot challenge.
no code implementations • 7 Nov 2023 • Georgios Pantazopoulos, Malvina Nikandrou, Amit Parekh, Bhathiya Hemanthage, Arash Eshghi, Ioannis Konstas, Verena Rieser, Oliver Lemon, Alessandro Suglia
Interactive and embodied tasks pose at least two fundamental challenges to existing Vision & Language (VL) models, including 1) grounding language in trajectories of actions and observations, and 2) referential disambiguation.
no code implementations • 28 Apr 2023 • Lu Yu, Malvina Nikandrou, Jiali Jin, Verena Rieser
In this paper, we propose a quality-agnostic framework to improve the performance and robustness of image captioning models for visually impaired people.
1 code implementation • 8 Nov 2022 • Alessandro Suglia, José Lopes, Emanuele Bastianelli, Andrea Vanzo, Shubham Agarwal, Malvina Nikandrou, Lu Yu, Ioannis Konstas, Verena Rieser
As the course of a game is unpredictable, so are commentaries, which makes them a unique resource to investigate dynamic language grounding.