1 code implementation • CMCL (ACL) 2022 • Ece Takmaz, Sandro Pezzelle, Raquel Fernández
In this work, we use a transformer-based pre-trained multimodal model, CLIP, to shed light on the mechanisms employed by human speakers when referring to visual entities.
no code implementations • CMCL (ACL) 2022 • Ece Takmaz
In this paper, we present the details of our approaches that attained the second place in the shared task of the ACL 2022 Cognitive Modeling and Computational Linguistics Workshop.
1 code implementation • 2 Feb 2024 • Ece Takmaz, Sandro Pezzelle, Raquel Fernández
There is an intricate relation between the properties of an image and how humans behave while describing the image.
1 code implementation • 31 May 2023 • Ece Takmaz, Nicolo' Brandizzi, Mario Giulianelli, Sandro Pezzelle, Raquel Fernández
Inspired by psycholinguistic theories, we endow our speaker with the ability to adapt its referring expressions via a simulation module that monitors the effectiveness of planned utterances from the listener's perspective.
1 code implementation • EMNLP 2020 • Ece Takmaz, Sandro Pezzelle, Lisa Beinborn, Raquel Fernández
When speakers describe an image, they tend to look at objects before mentioning them.
no code implementations • EMNLP 2020 • Ece Takmaz, Mario Giulianelli, Sandro Pezzelle, Arabella Sinclair, Raquel Fernández
We propose a generation model that produces referring utterances grounded in both the visual and the conversational context.
no code implementations • ACL 2019 • Janosch Haber, Tim Baumgärtner, Ece Takmaz, Lieke Gelderloos, Elia Bruni, Raquel Fernández
This paper introduces the PhotoBook dataset, a large-scale collection of visually-grounded, task-oriented dialogues in English designed to investigate shared dialogue history accumulating during conversation.
no code implementations • WS 2019 • Ravi Shekhar, Ece Takmaz, Raquel Fernández, Raffaella Bernardi
The multimodal models used in the emerging field at the intersection of computational linguistics and computer vision implement the bottom-up processing of the `Hub and Spoke' architecture proposed in cognitive science to represent how the brain processes and combines multi-sensory inputs.