Search Results for author: Sandro Pezzelle

Found 25 papers, 16 papers with code

Less Descriptive yet Discriminative: Quantifying the Properties of Multimodal Referring Utterances via CLIP

1 code implementation CMCL (ACL) 2022 Ece Takmaz, Sandro Pezzelle, Raquel Fernández

In this work, we use a transformer-based pre-trained multimodal model, CLIP, to shed light on the mechanisms employed by human speakers when referring to visual entities.

Descriptive

Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms

no code implementations26 Mar 2024 Michael Hanna, Sandro Pezzelle, Yonatan Belinkov

Most studies determine which edges belong in a LM's circuit by performing causal interventions on each edge independently, but this scales poorly with model size.

Language Modelling

Naming, Describing, and Quantifying Visual Objects in Humans and LLMs

1 code implementation11 Mar 2024 Alberto Testoni, Juell Sprott, Sandro Pezzelle

While human speakers use a variety of different expressions when describing the same object in an image, giving rise to a distribution of plausible labels driven by pragmatic constraints, the extent to which current Vision \& Language Large Language Models (VLLMs) can mimic this crucial feature of language use is an open question.

Do Pre-Trained Language Models Detect and Understand Semantic Underspecification? Ask the DUST!

1 code implementation19 Feb 2024 Frank Wildenburg, Michael Hanna, Sandro Pezzelle

In this work, we propose a novel Dataset of semantically Underspecified Sentences grouped by Type (DUST) and use it to study whether pre-trained language models (LMs) correctly identify and interpret underspecified sentences.

Sentence

Describing Images $\textit{Fast and Slow}$: Quantifying and Predicting the Variation in Human Signals during Visuo-Linguistic Processes

1 code implementation2 Feb 2024 Ece Takmaz, Sandro Pezzelle, Raquel Fernández

There is an intricate relation between the properties of an image and how humans behave while describing the image.

GROOViST: A Metric for Grounding Objects in Visual Storytelling

1 code implementation26 Oct 2023 Aditya K Surikuchi, Sandro Pezzelle, Raquel Fernández

A proper evaluation of stories generated for a sequence of images -- the task commonly referred to as visual storytelling -- must consider multiple aspects, such as coherence, grammatical correctness, and visual grounding.

Visual Grounding Visual Storytelling

When Language Models Fall in Love: Animacy Processing in Transformer Language Models

1 code implementation23 Oct 2023 Michael Hanna, Yonatan Belinkov, Sandro Pezzelle

However, we also show that even when presented with stories about atypically animate entities, such as a peanut in love, LMs adapt: they treat these entities as animate, though they do not adapt as well as humans.

The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models

1 code implementation23 Oct 2023 Xinyi Chen, Raquel Fernández, Sandro Pezzelle

Despite the impressive performance achieved by pre-trained language-and-vision models in downstream tasks, it remains an open question whether this reflects a proper understanding of image-text interaction.

In-Context Learning

Dealing with Semantic Underspecification in Multimodal NLP

1 code implementation8 Jun 2023 Sandro Pezzelle

Intelligent systems that aim at mastering language as humans do must deal with its semantic underspecification, namely, the possibility for a linguistic signal to convey only part of the information needed for communication to succeed.

Speaking the Language of Your Listener: Audience-Aware Adaptation via Plug-and-Play Theory of Mind

1 code implementation31 May 2023 Ece Takmaz, Nicolo' Brandizzi, Mario Giulianelli, Sandro Pezzelle, Raquel Fernández

Inspired by psycholinguistic theories, we endow our speaker with the ability to adapt its referring expressions via a simulation module that monitors the effectiveness of planned utterances from the listener's perspective.

Language Modelling Open-Ended Question Answering +1

A Psycholinguistic Analysis of BERT's Representations of Compounds

1 code implementation14 Feb 2023 Lars Buijtelaar, Sandro Pezzelle

We build on recent studies that explore semantic information in Transformers at the word level and test whether BERT aligns with human semantic intuitions when dealing with expressions (e. g., sunlight) whose overall meaning depends -- to a various extent -- on the semantics of the constituent words (sun, light).

EaSe: A Diagnostic Tool for VQA based on Answer Diversity

1 code implementation NAACL 2021 Shailza Jolly, Sandro Pezzelle, Moin Nabi

We propose EASE, a simple diagnostic tool for Visual Question Answering (VQA) which quantifies the difficulty of an image, question sample.

Question Answering Visual Question Answering

Is the Red Square Big? MALeViC: Modeling Adjectives Leveraging Visual Contexts

no code implementations27 Aug 2019 Sandro Pezzelle, Raquel Fernández

This work aims at modeling how the meaning of gradable adjectives of size (`big', `small') can be learned from visually-grounded contexts.

Comparatives, Quantifiers, Proportions: A Multi-Task Model for the Learning of Quantities from Vision

1 code implementation NAACL 2018 Sandro Pezzelle, Ionut-Teodor Sorodoc, Raffaella Bernardi

The present work investigates whether different quantification mechanisms (set comparison, vague quantification, and proportional estimation) can be jointly learned from visual scenes by a multi-task computational model.

FOIL it! Find One mismatch between Image and Language caption

no code implementations ACL 2017 Ravi Shekhar, Sandro Pezzelle, Yauhen Klimovich, Aurelie Herbelot, Moin Nabi, Enver Sangineto, Raffaella Bernardi

In this paper, we aim to understand whether current language and vision (LaVi) models truly grasp the interaction between the two modalities.

Pay Attention to Those Sets! Learning Quantification from Images

no code implementations10 Apr 2017 Ionut Sorodoc, Sandro Pezzelle, Aurélie Herbelot, Mariella Dimiccoli, Raffaella Bernardi

We however argue that precisely identifying the composition of the sets is not only beyond current state-of-the-art models but perhaps even detrimental to a task that is most efficiently performed by refining the approximate numerosity estimator of the system.

Be Precise or Fuzzy: Learning the Meaning of Cardinals and Quantifiers from Vision

no code implementations EACL 2017 Sandro Pezzelle, Marco Marelli, Raffaella Bernardi

People can refer to quantities in a visual scene by using either exact cardinals (e. g. one, two, three) or natural language quantifiers (e. g. few, most, all).

Cannot find the paper you are looking for? You can Submit a new open access paper.