Search Results for author: Daniela Massiceti

Found 13 papers, 6 papers with code

Explaining CLIP's performance disparities on data from blind/low vision users

no code implementations29 Nov 2023 Daniela Massiceti, Camilla Longden, Agnieszka Słowik, Samuel Wills, Martin Grayson, Cecily Morrison

Testing 25 CLIP variants in a zero-shot classification task, we find that their accuracy is 15 percentage points lower on average for images captured by BLV users than web-crawled images.

Few-Shot Learning Zero-Shot Learning

EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods

no code implementations3 Oct 2023 Samyadeep Basu, Mehrdad Saberi, Shweta Bhardwaj, Atoosa Malemir Chegini, Daniela Massiceti, Maziar Sanjabi, Shell Xu Hu, Soheil Feizi

From both the human study and automated evaluation, we find that: (i) Instruct-Pix2Pix, Null-Text and SINE are the top-performing methods averaged across different edit types, however {\it only} Instruct-Pix2Pix and Null-Text are able to preserve original image properties; (ii) Most of the editing methods fail at edits involving spatial operations (e. g., changing the position of an object).

Benchmarking text-guided-image-editing

NP-SemiSeg: When Neural Processes meet Semi-Supervised Semantic Segmentation

1 code implementation5 Aug 2023 JianFeng Wang, Daniela Massiceti, Xiaolin Hu, Vladimir Pavlovic, Thomas Lukasiewicz

This is useful in a wide range of real-world applications where collecting pixel-wise labels is not feasible in time or cost.

Segmentation Self-Driving Cars +3

Augmenting CLIP with Improved Visio-Linguistic Reasoning

no code implementations18 Jul 2023 Samyadeep Basu, Maziar Sanjabi, Daniela Massiceti, Shell Xu Hu, Soheil Feizi

On the challenging Winoground compositional reasoning benchmark, our method improves the absolute visio-linguistic performance of different CLIP models by up to 7%, while on the ARO dataset, our method improves the visio-linguistic performance by upto 3%.

Retrieval Text Retrieval +2

Strong Baselines for Parameter Efficient Few-Shot Fine-tuning

no code implementations4 Apr 2023 Samyadeep Basu, Daniela Massiceti, Shell Xu Hu, Soheil Feizi

Through our controlled empirical study, we have two main findings: (i) Fine-tuning just the LayerNorm parameters (which we call LN-Tune) during few-shot adaptation is an extremely strong baseline across ViTs pre-trained with both self-supervised and supervised objectives, (ii) For self-supervised ViTs, we find that simply learning a set of scaling parameters for each attention matrix (which we call AttnScale) along with a domain-residual adapter (DRA) module leads to state-of-the-art performance (while being $\sim\!$ 9$\times$ more parameter-efficient) on MD.

Few-Shot Image Classification

NP-Match: When Neural Processes meet Semi-Supervised Learning

1 code implementation3 Jul 2022 JianFeng Wang, Thomas Lukasiewicz, Daniela Massiceti, Xiaolin Hu, Vladimir Pavlovic, Alexandros Neophytou

Semi-supervised learning (SSL) has been widely explored in recent years, and it is an effective way of leveraging unlabeled data to reduce the reliance on labeled data.

Semi-Supervised Image Classification

A Revised Generative Evaluation of Visual Dialogue

1 code implementation20 Apr 2020 Daniela Massiceti, Viveka Kulharia, Puneet K. Dokania, N. Siddharth, Philip H. S. Torr

Evaluating Visual Dialogue, the task of answering a sequence of questions relating to a visual input, remains an open research challenge.

Visual Dialogue without Vision or Dialogue

2 code implementations16 Dec 2018 Daniela Massiceti, Puneet K. Dokania, N. Siddharth, Philip H. S. Torr

We characterise some of the quirks and shortcomings in the exploration of Visual Dialogue - a sequential question-answering task where the questions and corresponding answers are related through given visual stimuli.

Question Answering Visual Dialog

FlipDial: A Generative Model for Two-Way Visual Dialogue

no code implementations CVPR 2018 Daniela Massiceti, N. Siddharth, Puneet K. Dokania, Philip H. S. Torr

We are the first to extend this paradigm to full two-way visual dialogue, where our model is capable of generating both questions and answers in sequence based on a visual input, for which we propose a set of novel evaluation measures and metrics.

Visual Dialog Vocal Bursts Valence Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.