no code implementations • 28 Jun 2023 • Zhenlin Xu, Yi Zhu, Tiffany Deng, Abhay Mittal, Yanbei Chen, Manchen Wang, Paolo Favaro, Joseph Tighe, Davide Modolo
This paper introduces innovative benchmarks to evaluate Vision-Language Models (VLMs) in real-world zero-shot recognition tasks, focusing on the granularity and specificity of prompting text.