1 code implementation • CMCL (ACL) 2022 • Ece Takmaz, Sandro Pezzelle, Raquel Fernández
In this work, we use a transformer-based pre-trained multimodal model, CLIP, to shed light on the mechanisms employed by human speakers when referring to visual entities.
1 code implementation • ACL (RepL4NLP) 2021 • Iuliia Parfenova, Desmond Elliott, Raquel Fernández, Sandro Pezzelle
We investigate the representations learned by vision and language models in tasks that require relational reasoning.
no code implementations • 26 Mar 2024 • Michael Hanna, Sandro Pezzelle, Yonatan Belinkov
Most studies determine which edges belong in a LM's circuit by performing causal interventions on each edge independently, but this scales poorly with model size.
1 code implementation • 11 Mar 2024 • Alberto Testoni, Juell Sprott, Sandro Pezzelle
While human speakers use a variety of different expressions when describing the same object in an image, giving rise to a distribution of plausible labels driven by pragmatic constraints, the extent to which current Vision \& Language Large Language Models (VLLMs) can mimic this crucial feature of language use is an open question.
1 code implementation • 19 Feb 2024 • Frank Wildenburg, Michael Hanna, Sandro Pezzelle
In this work, we propose a novel Dataset of semantically Underspecified Sentences grouped by Type (DUST) and use it to study whether pre-trained language models (LMs) correctly identify and interpret underspecified sentences.
1 code implementation • 2 Feb 2024 • Ece Takmaz, Sandro Pezzelle, Raquel Fernández
There is an intricate relation between the properties of an image and how humans behave while describing the image.
1 code implementation • 26 Oct 2023 • Aditya K Surikuchi, Sandro Pezzelle, Raquel Fernández
A proper evaluation of stories generated for a sequence of images -- the task commonly referred to as visual storytelling -- must consider multiple aspects, such as coherence, grammatical correctness, and visual grounding.
1 code implementation • 23 Oct 2023 • Michael Hanna, Yonatan Belinkov, Sandro Pezzelle
However, we also show that even when presented with stories about atypically animate entities, such as a peanut in love, LMs adapt: they treat these entities as animate, though they do not adapt as well as humans.
1 code implementation • 23 Oct 2023 • Xinyi Chen, Raquel Fernández, Sandro Pezzelle
Despite the impressive performance achieved by pre-trained language-and-vision models in downstream tasks, it remains an open question whether this reflects a proper understanding of image-text interaction.
no code implementations • 3 Jul 2023 • Giovanni Cinà, Daniel Fernandez-Llaneza, Ludovico Deponte, Nishant Mishra, Tabea E. Röber, Sandro Pezzelle, Iacer Calixto, Rob Goedhart, Ş. İlker Birbil
Feature attribution methods have become a staple method to disentangle the complex behavior of black box models.
1 code implementation • 8 Jun 2023 • Sandro Pezzelle
Intelligent systems that aim at mastering language as humans do must deal with its semantic underspecification, namely, the possibility for a linguistic signal to convey only part of the information needed for communication to succeed.
1 code implementation • 31 May 2023 • Ece Takmaz, Nicolo' Brandizzi, Mario Giulianelli, Sandro Pezzelle, Raquel Fernández
Inspired by psycholinguistic theories, we endow our speaker with the ability to adapt its referring expressions via a simulation module that monitors the effectiveness of planned utterances from the listener's perspective.
1 code implementation • 14 Feb 2023 • Lars Buijtelaar, Sandro Pezzelle
We build on recent studies that explore semantic information in Transformers at the word level and test whether BERT aligns with human semantic intuitions when dealing with expressions (e. g., sunlight) whose overall meaning depends -- to a various extent -- on the semantics of the constituent words (sun, light).
1 code implementation • NAACL 2021 • Shailza Jolly, Sandro Pezzelle, Moin Nabi
We propose EASE, a simple diagnostic tool for Visual Question Answering (VQA) which quantifies the difficulty of an image, question sample.
no code implementations • EMNLP 2020 • Ece Takmaz, Mario Giulianelli, Sandro Pezzelle, Arabella Sinclair, Raquel Fernández
We propose a generation model that produces referring utterances grounded in both the visual and the conversational context.
1 code implementation • EMNLP 2020 • Ece Takmaz, Sandro Pezzelle, Lisa Beinborn, Raquel Fernández
When speakers describe an image, they tend to look at objects before mentioning them.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Sandro Pezzelle, Claudio Greco, Greta Gandolfi, Eleonora Gualdoni, Raffaella Bernardi
This paper introduces BD2BB, a novel language and vision benchmark that requires multimodal models combine complementary information from the two modalities.
no code implementations • 27 Aug 2019 • Sandro Pezzelle, Raquel Fernández
This work aims at modeling how the meaning of gradable adjectives of size (`big', `small') can be learned from visually-grounded contexts.
no code implementations • 12 Sep 2018 • Shailza Jolly, Sandro Pezzelle, Tassilo Klein, Andreas Dengel, Moin Nabi
We show that our metric is effective in providing a more fine-grained evaluation both on the quantitative and qualitative level.
1 code implementation • ACL 2018 • Sandro Pezzelle, Shane Steinert-Threlkeld, Raffaela Bernardi, Jakub Szymanik
We study the role of linguistic context in predicting quantifiers (`few', `all').
1 code implementation • NAACL 2018 • Sandro Pezzelle, Ionut-Teodor Sorodoc, Raffaella Bernardi
The present work investigates whether different quantification mechanisms (set comparison, vague quantification, and proportional estimation) can be jointly learned from visual scenes by a multi-task computational model.
no code implementations • ACL 2017 • Ravi Shekhar, Sandro Pezzelle, Yauhen Klimovich, Aurelie Herbelot, Moin Nabi, Enver Sangineto, Raffaella Bernardi
In this paper, we aim to understand whether current language and vision (LaVi) models truly grasp the interaction between the two modalities.
no code implementations • 10 Apr 2017 • Ionut Sorodoc, Sandro Pezzelle, Aurélie Herbelot, Mariella Dimiccoli, Raffaella Bernardi
We however argue that precisely identifying the composition of the sets is not only beyond current state-of-the-art models but perhaps even detrimental to a task that is most efficiently performed by refining the approximate numerosity estimator of the system.
no code implementations • EACL 2017 • Sandro Pezzelle, Marco Marelli, Raffaella Bernardi
People can refer to quantities in a visual scene by using either exact cardinals (e. g. one, two, three) or natural language quantifiers (e. g. few, most, all).
2 code implementations • ACL 2016 • Denis Paperno, Germán Kruszewski, Angeliki Lazaridou, Quan Ngoc Pham, Raffaella Bernardi, Sandro Pezzelle, Marco Baroni, Gemma Boleda, Raquel Fernández
We introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task.