Search Results for author: Georgios Pantazopoulos

Found 2 papers, 1 papers with code

Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language Resamplers

1 code implementation21 Apr 2024 Georgios Pantazopoulos, Alessandro Suglia, Oliver Lemon, Arash Eshghi

In this paper, we use \textit{diagnostic classifiers} to measure the extent to which the visual prompt produced by the resampler encodes spatial information.

Image Captioning Question Answering +1

Multitask Multimodal Prompted Training for Interactive Embodied Task Completion

no code implementations7 Nov 2023 Georgios Pantazopoulos, Malvina Nikandrou, Amit Parekh, Bhathiya Hemanthage, Arash Eshghi, Ioannis Konstas, Verena Rieser, Oliver Lemon, Alessandro Suglia

Interactive and embodied tasks pose at least two fundamental challenges to existing Vision & Language (VL) models, including 1) grounding language in trajectories of actions and observations, and 2) referential disambiguation.

Decoder Text Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.