Search Results for author: Zaid Khan

Found 10 papers, 5 papers with code

Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering

no code implementations16 Apr 2024 Zaid Khan, Yun Fu

We find that neighborhood consistency can be used to identify model responses to visual questions that are likely unreliable, even in adversarial settings or settings that are out-of-distribution to the proxy model.

Language Modelling Question Answering +1

Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement

no code implementations6 Apr 2024 Zaid Khan, Vijay Kumar BG, Samuel Schulter, Yun Fu, Manmohan Chandraker

We propose a method where we exploit existing annotations for a vision-language task to improvise a coarse reward signal for that task, treat the LLM as a policy, and apply reinforced self-training to improve the visual program synthesis ability of the LLM for that task.

object-detection Object Detection +4

Contrastive Alignment of Vision to Language Through Parameter-Efficient Transfer Learning

1 code implementation21 Mar 2023 Zaid Khan, Yun Fu

We find that a minimal set of parameter updates ($<$7%) can achieve the same performance as full-model training, and updating specific components ($<$1% of parameters) can match 75% of full-model training.

Language Modelling Transfer Learning

Single-Stream Multi-Level Alignment for Vision-Language Pretraining

1 code implementation27 Mar 2022 Zaid Khan, Vijay Kumar BG, Xiang Yu, Samuel Schulter, Manmohan Chandraker, Yun Fu

Self-supervised vision-language pretraining from pure images and text with a contrastive loss is effective, but ignores fine-grained alignment due to a dual-stream architecture that aligns image and text representations only on a global level.

Question Answering Referring Expression +4

Where is the bottleneck in long-tailed classification?

no code implementations29 Sep 2021 Zaid Khan, Yun Fu

A commonly held belief in deep-learning based long-tailed classification is that the representations learned from long-tailed data are ”good enough” and the performance bottleneck is the classification head atop the representation learner.

Classification Data Augmentation

Exploiting BERT For Multimodal Target Sentiment Classification Through Input Space Translation

1 code implementation3 Aug 2021 Zaid Khan, Yun Fu

Our approach increases the amount of text available to the language model and distills the object-level information in complex images.

Language Modelling Multimodal Sentiment Analysis +4

One Label, One Billion Faces: Usage and Consistency of Racial Categories in Computer Vision

no code implementations3 Feb 2021 Zaid Khan, Yun Fu

Using the insight that a classifier can learn the racial system encoded by a dataset, we conduct an empirical study of computer vision datasets supplying categorical race labels for face images to determine the cross-dataset consistency and generalization of racial categories.

Benchmarking Fairness

Families In Wild Multimedia: A Multimodal Database for Recognizing Kinship

no code implementations28 Jul 2020 Joseph P. Robinson, Zaid Khan, Yu Yin, Ming Shao, Yun Fu

Thus, to narrow the gap between research and reality and enhance the power of kinship recognition systems, we extend FIW with multimedia (MM) data (i. e., video, audio, and text captions).

Recognizing Families In the Wild: White Paper for the 4th Edition Data Challenge

2 code implementations15 Feb 2020 Joseph P. Robinson, Yu Yin, Zaid Khan, Ming Shao, Siyu Xia, Michael Stopa, Samson Timoner, Matthew A. Turk, Rama Chellappa, Yun Fu

Recognizing Families In the Wild (RFIW): an annual large-scale, multi-track automatic kinship recognition evaluation that supports various visual kin-based problems on scales much higher than ever before.

Gesture Recognition Kinship Verification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.