1 code implementation • 7 Feb 2024 • Jirayu Burapacheep, Ishan Gaur, Agam Bhatia, Tristan Thrush
We evaluate image-text matching (ITM) and visual language models (VLMs) and find that even the latest ones are still not robust at this task.
Image Generation Image-text matching +1