no code implementations • 13 Nov 2023 • Ilker Kesen, Andrea Pedrotti, Mustafa Dogan, Michele Cafagna, Emre Can Acikgoz, Letitia Parcalabescu, Iacer Calixto, Anette Frank, Albert Gatt, Aykut Erdem, Erkut Erdem
With the ever-increasing popularity of pretrained Video-Language Models (VidLMs), there is a pressing need to develop robust evaluation methodologies that delve deeper into their visio-linguistic capabilities.
no code implementations • 28 Apr 2023 • Michele Cafagna, Lina M. Rojas-Barahona, Kees Van Deemter, Albert Gatt
When applied to Image-to-text models, interpretability methods often provide token-by-token explanations namely, they compute a visual explanation for each token of the generated sequence.
1 code implementation • 23 Feb 2023 • Michele Cafagna, Kees Van Deemter, Albert Gatt
We present the High-Level Dataset a dataset extending 14997 images from the COCO dataset, aligned with a new set of 134, 973 human-annotated (high-level) captions collected along three axes: scenes, actions, and rationales.
no code implementations • 9 Nov 2022 • Michele Cafagna, Kees Van Deemter, Albert Gatt
Image captioning models tend to describe images in an object-centric way, emphasising visible objects.
1 code implementation • ACL 2022 • Letitia Parcalabescu, Michele Cafagna, Lilitta Muradjan, Anette Frank, Iacer Calixto, Albert Gatt
We propose VALSE (Vision And Language Structured Evaluation), a novel benchmark designed for testing general-purpose pretrained vision and language (V&L) models for their visio-linguistic grounding capabilities on specific linguistic phenomena.
no code implementations • 15 Sep 2021 • Michele Cafagna, Kees Van Deemter, Albert Gatt
Images can be described in terms of the objects they contain, or in terms of the types of scene or place that they instantiate.
1 code implementation • ACL (EvalNLGEval, INLG) 2020 • Lorenzo De Mattei, Michele Cafagna, Huiyuan Lai, Felice Dell'Orletta, Malvina Nissim, Albert Gatt
An ongoing debate in the NLG community concerns the best way to evaluate systems, with human evaluation often being considered the most reliable method, compared to corpus-based metrics.
no code implementations • LREC 2020 • Rob van der Goot, Alan Ramponi, Tommaso Caselli, Michele Cafagna, Lorenzo De Mattei
However, for Italian, there is no benchmark available for lexical normalization, despite the presence of many benchmarks for other tasks involving social media data.
no code implementations • LREC 2020 • Lorenzo De Mattei, Michele Cafagna, Felice Dell{'}Orletta, Malvina Nissim
We automatically generate headlines that are expected to comply with the specific styles of two different Italian newspapers.
1 code implementation • 29 Apr 2020 • Lorenzo De Mattei, Michele Cafagna, Felice Dell'Orletta, Malvina Nissim, Marco Guerini
We provide a thorough analysis of GePpeTto's quality by means of both an automatic and a human-based evaluation.