Search Results for author: Zaid Khan

Found 10 papers, 5 papers with code

Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering

no code implementations • 16 Apr 2024 • Zaid Khan, Yun Fu

We find that neighborhood consistency can be used to identify model responses to visual questions that are likely unreliable, even in adversarial settings or settings that are out-of-distribution to the proxy model.

Language Modelling Question Answering +1

Paper
Add Code

Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement

no code implementations • 6 Apr 2024 • Zaid Khan, Vijay Kumar BG, Samuel Schulter, Yun Fu, Manmohan Chandraker

We propose a method where we exploit existing annotations for a vision-language task to improvise a coarse reward signal for that task, treat the LLM as a policy, and apply reinforced self-training to improve the visual program synthesis ability of the LLM for that task.

object-detection Object Detection +4

Paper
Add Code

Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!

1 code implementation • CVPR 2023 • Zaid Khan, Vijay Kumar BG, Samuel Schulter, Xiang Yu, Yun Fu, Manmohan Chandraker

We introduce SelTDA (Self-Taught Data Augmentation), a strategy for finetuning large VLMs on small-scale VQA datasets.

counterfactual Data Augmentation +4

Paper
Code

Contrastive Alignment of Vision to Language Through Parameter-Efficient Transfer Learning

1 code implementation • 21 Mar 2023 • Zaid Khan, Yun Fu

We find that a minimal set of parameter updates ($<$7%) can achieve the same performance as full-model training, and updating specific components ($<$1% of parameters) can match 75% of full-model training.

Language Modelling Transfer Learning

Paper
Code

Single-Stream Multi-Level Alignment for Vision-Language Pretraining

1 code implementation • 27 Mar 2022 • Zaid Khan, Vijay Kumar BG, Xiang Yu, Samuel Schulter, Manmohan Chandraker, Yun Fu

Self-supervised vision-language pretraining from pure images and text with a contrastive loss is effective, but ignores fine-grained alignment due to a dual-stream architecture that aligns image and text representations only on a global level.

Question Answering Referring Expression +4

Paper
Code

Where is the bottleneck in long-tailed classification?

no code implementations • 29 Sep 2021 • Zaid Khan, Yun Fu

A commonly held belief in deep-learning based long-tailed classiﬁcation is that the representations learned from long-tailed data are ”good enough” and the performance bottleneck is the classiﬁcation head atop the representation learner.

Classification Data Augmentation

Paper
Add Code

Exploiting BERT For Multimodal Target Sentiment Classification Through Input Space Translation

1 code implementation • 3 Aug 2021 • Zaid Khan, Yun Fu

Our approach increases the amount of text available to the language model and distills the object-level information in complex images.

Language Modelling Multimodal Sentiment Analysis +4

Paper
Code

One Label, One Billion Faces: Usage and Consistency of Racial Categories in Computer Vision

no code implementations • 3 Feb 2021 • Zaid Khan, Yun Fu

Using the insight that a classifier can learn the racial system encoded by a dataset, we conduct an empirical study of computer vision datasets supplying categorical race labels for face images to determine the cross-dataset consistency and generalization of racial categories.

Benchmarking Fairness

Paper
Add Code

Families In Wild Multimedia: A Multimodal Database for Recognizing Kinship

no code implementations • 28 Jul 2020 • Joseph P. Robinson, Zaid Khan, Yu Yin, Ming Shao, Yun Fu

Thus, to narrow the gap between research and reality and enhance the power of kinship recognition systems, we extend FIW with multimedia (MM) data (i. e., video, audio, and text captions).

Paper
Add Code

Recognizing Families In the Wild: White Paper for the 4th Edition Data Challenge

2 code implementations • 15 Feb 2020 • Joseph P. Robinson, Yu Yin, Zaid Khan, Ming Shao, Siyu Xia, Michael Stopa, Samson Timoner, Matthew A. Turk, Rama Chellappa, Yun Fu

Recognizing Families In the Wild (RFIW): an annual large-scale, multi-track automatic kinship recognition evaluation that supports various visual kin-based problems on scales much higher than ever before.

Gesture Recognition Kinship Verification +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.