1 code implementation • 4 Apr 2024 • Vishaal Udandarao, Ameya Prabhu, Adhiraj Ghosh, Yash Sharma, Philip H. S. Torr, Adel Bibi, Samuel Albanie, Matthias Bethge
Web-crawled pretraining datasets underlie the impressive "zero-shot" evaluation performance of multimodal models, such as CLIP for classification/retrieval and Stable-Diffusion for image generation.
1 code implementation • 29 Feb 2024 • Ameya Prabhu, Vishaal Udandarao, Philip Torr, Matthias Bethge, Adel Bibi, Samuel Albanie
However, with repeated testing, the risk of overfitting grows as algorithms over-exploit benchmark idiosyncrasies.
1 code implementation • 12 Oct 2023 • Vishaal Udandarao, Max F. Burg, Samuel Albanie, Matthias Bethge
This finding points to a blind spot in current frontier VLMs: they excel in recognizing semantic content but fail to acquire an understanding of visual data-types through scaling.
2 code implementations • ICCV 2023 • Vishaal Udandarao, Ankush Gupta, Samuel Albanie
Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet effective way to train large-scale vision-language models.
1 code implementation • 20 Jul 2020 • Surabhi S. Nath, Vishaal Udandarao, Jainendra Shukla
We build a novel algorithm for mapping categorical and dimensional model labels using annotation transfer across affective facial image datasets.
no code implementations • 16 Jun 2020 • Vishaal Udandarao, Mohit Agrawal, Rajesh Kumar, Rajiv Ratn Shah
On the other hand, for regression tasks, we evaluated three ML and four DL-based regressors.
1 code implementation • 10 Jun 2020 • Sarthak Bhagat, Vishaal Udandarao, Shagun Uppal
Disentangling the underlying feature attributes within an image with no prior supervision is a challenging task.
1 code implementation • 7 May 2020 • Vishaal Udandarao, Abhishek Maiti, Deepak Srivatsav, Suryatej Reddy Vyalla, Yifang Yin, Rajiv Ratn Shah
In this paper, we present a novel framework COBRA that aims to train two modalities (image and text) in a joint fashion inspired by the Contrastive Predictive Coding (CPC) and Noise Contrastive Estimation (NCE) paradigms which preserve both inter and intra-class relationships.
no code implementations • 12 Nov 2019 • Abhishek Agarwal, Nikhil Sachdeva, Raj Kamal Yadav, Vishaal Udandarao, Vrinda Mittal, Anubha Gupta, Abhinav Mathur
Most of the existing question answering models can be largely compiled into two categories: i) open domain question answering models that answer generic questions and use large-scale knowledge base along with the targeted web-corpus retrieval and ii) closed domain question answering models that address focused questioning area and use complex deep learning models.
1 code implementation • 27 Oct 2019 • Suryatej Reddy Vyalla, Vishaal Udandarao, Tanmoy Chakraborty
The dataset consists of 1. 1 million meme captions from 128 classes.