no code implementations • 16 Apr 2024 • Zaid Khan, Yun Fu
We find that neighborhood consistency can be used to identify model responses to visual questions that are likely unreliable, even in adversarial settings or settings that are out-of-distribution to the proxy model.
no code implementations • 6 Apr 2024 • Zaid Khan, Vijay Kumar BG, Samuel Schulter, Yun Fu, Manmohan Chandraker
We propose a method where we exploit existing annotations for a vision-language task to improvise a coarse reward signal for that task, treat the LLM as a policy, and apply reinforced self-training to improve the visual program synthesis ability of the LLM for that task.
1 code implementation • CVPR 2023 • Zaid Khan, Vijay Kumar BG, Samuel Schulter, Xiang Yu, Yun Fu, Manmohan Chandraker
We introduce SelTDA (Self-Taught Data Augmentation), a strategy for finetuning large VLMs on small-scale VQA datasets.
1 code implementation • 21 Mar 2023 • Zaid Khan, Yun Fu
We find that a minimal set of parameter updates ($<$7%) can achieve the same performance as full-model training, and updating specific components ($<$1% of parameters) can match 75% of full-model training.
1 code implementation • 27 Mar 2022 • Zaid Khan, Vijay Kumar BG, Xiang Yu, Samuel Schulter, Manmohan Chandraker, Yun Fu
Self-supervised vision-language pretraining from pure images and text with a contrastive loss is effective, but ignores fine-grained alignment due to a dual-stream architecture that aligns image and text representations only on a global level.
no code implementations • 29 Sep 2021 • Zaid Khan, Yun Fu
A commonly held belief in deep-learning based long-tailed classification is that the representations learned from long-tailed data are ”good enough” and the performance bottleneck is the classification head atop the representation learner.
1 code implementation • 3 Aug 2021 • Zaid Khan, Yun Fu
Our approach increases the amount of text available to the language model and distills the object-level information in complex images.
no code implementations • 3 Feb 2021 • Zaid Khan, Yun Fu
Using the insight that a classifier can learn the racial system encoded by a dataset, we conduct an empirical study of computer vision datasets supplying categorical race labels for face images to determine the cross-dataset consistency and generalization of racial categories.
no code implementations • 28 Jul 2020 • Joseph P. Robinson, Zaid Khan, Yu Yin, Ming Shao, Yun Fu
Thus, to narrow the gap between research and reality and enhance the power of kinship recognition systems, we extend FIW with multimedia (MM) data (i. e., video, audio, and text captions).
2 code implementations • 15 Feb 2020 • Joseph P. Robinson, Yu Yin, Zaid Khan, Ming Shao, Siyu Xia, Michael Stopa, Samson Timoner, Matthew A. Turk, Rama Chellappa, Yun Fu
Recognizing Families In the Wild (RFIW): an annual large-scale, multi-track automatic kinship recognition evaluation that supports various visual kin-based problems on scales much higher than ever before.