no code implementations • 29 Nov 2023 • Daniela Massiceti, Camilla Longden, Agnieszka Słowik, Samuel Wills, Martin Grayson, Cecily Morrison
Testing 25 CLIP variants in a zero-shot classification task, we find that their accuracy is 15 percentage points lower on average for images captured by BLV users than web-crawled images.
no code implementations • 3 Oct 2023 • Samyadeep Basu, Mehrdad Saberi, Shweta Bhardwaj, Atoosa Malemir Chegini, Daniela Massiceti, Maziar Sanjabi, Shell Xu Hu, Soheil Feizi
From both the human study and automated evaluation, we find that: (i) Instruct-Pix2Pix, Null-Text and SINE are the top-performing methods averaged across different edit types, however {\it only} Instruct-Pix2Pix and Null-Text are able to preserve original image properties; (ii) Most of the editing methods fail at edits involving spatial operations (e. g., changing the position of an object).
1 code implementation • 5 Aug 2023 • JianFeng Wang, Daniela Massiceti, Xiaolin Hu, Vladimir Pavlovic, Thomas Lukasiewicz
This is useful in a wide range of real-world applications where collecting pixel-wise labels is not feasible in time or cost.
no code implementations • 18 Jul 2023 • Samyadeep Basu, Maziar Sanjabi, Daniela Massiceti, Shell Xu Hu, Soheil Feizi
On the challenging Winoground compositional reasoning benchmark, our method improves the absolute visio-linguistic performance of different CLIP models by up to 7%, while on the ARO dataset, our method improves the visio-linguistic performance by upto 3%.
no code implementations • 4 Apr 2023 • Samyadeep Basu, Daniela Massiceti, Shell Xu Hu, Soheil Feizi
Through our controlled empirical study, we have two main findings: (i) Fine-tuning just the LayerNorm parameters (which we call LN-Tune) during few-shot adaptation is an extremely strong baseline across ViTs pre-trained with both self-supervised and supervised objectives, (ii) For self-supervised ViTs, we find that simply learning a set of scaling parameters for each attention matrix (which we call AttnScale) along with a domain-residual adapter (DRA) module leads to state-of-the-art performance (while being $\sim\!$ 9$\times$ more parameter-efficient) on MD.
1 code implementation • 3 Jul 2022 • JianFeng Wang, Thomas Lukasiewicz, Daniela Massiceti, Xiaolin Hu, Vladimir Pavlovic, Alexandros Neophytou
Semi-supervised learning (SSL) has been widely explored in recent years, and it is an effective way of leveraging unlabeled data to reduce the reliance on labeled data.
2 code implementations • NeurIPS 2021 • John Bronskill, Daniela Massiceti, Massimiliano Patacchiola, Katja Hofmann, Sebastian Nowozin, Richard E. Turner
This limitation arises because a task's entire support set, which can contain up to 1000 images, must be processed before an optimization step can be taken.
1 code implementation • ICCV 2021 • Daniela Massiceti, Luisa Zintgraf, John Bronskill, Lida Theodorou, Matthew Tobias Harris, Edward Cutrell, Cecily Morrison, Katja Hofmann, Simone Stumpf
To close this gap, we present the ORBIT dataset and benchmark, grounded in the real-world application of teachable object recognizers for people who are blind/low-vision.
1 code implementation • 20 Apr 2020 • Daniela Massiceti, Viveka Kulharia, Puneet K. Dokania, N. Siddharth, Philip H. S. Torr
Evaluating Visual Dialogue, the task of answering a sequence of questions relating to a visual input, remains an open research challenge.
2 code implementations • 16 Dec 2018 • Daniela Massiceti, Puneet K. Dokania, N. Siddharth, Philip H. S. Torr
We characterise some of the quirks and shortcomings in the exploration of Visual Dialogue - a sequential question-answering task where the questions and corresponding answers are related through given visual stimuli.
no code implementations • CVPR 2018 • Daniela Massiceti, N. Siddharth, Puneet K. Dokania, Philip H. S. Torr
We are the first to extend this paradigm to full two-way visual dialogue, where our model is capable of generating both questions and answers in sequence based on a visual input, for which we propose a set of novel evaluation measures and metrics.
no code implementations • 7 Dec 2016 • Qinbin Hou, Puneet Kumar Dokania, Daniela Massiceti, Yunchao Wei, Ming-Ming Cheng, Philip Torr
We focus on the following three aspects of EM: (i) initialization; (ii) latent posterior estimation (E-step) and (iii) the parameter update (M-step).
Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation
no code implementations • 19 Sep 2016 • Daniela Massiceti, Alexander Krull, Eric Brachmann, Carsten Rother, Philip H. S. Torr
This work addresses the task of camera localization in a known 3D scene given a single input RGB image.