Zero-Shot Visual Question Answring
3 papers with code • 0 benchmarks • 0 datasets
Benchmarks
These leaderboards are used to track progress in Zero-Shot Visual Question Answring
Most implemented papers
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
Problems include: (1) How to systematically structure and evaluate the complicated multimodal tasks; (2) How to design evaluation metrics that work well across question and answer types; and (3) How to give model insights beyond a simple performance ranking.
Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
Deep network models are often purely inductive during both training and inference on unseen data.
CoLLaVO: Crayon Large Language and Vision mOdel
Our findings reveal that the image understanding capabilities of current VLMs are strongly correlated with their zero-shot performance on vision language (VL) tasks.