Zero-Shot Visual Question Answring

3 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

yuweihao/mm-vet 4 Aug 2023

Problems include: (1) How to systematically structure and evaluate the complicated multimodal tasks; (2) How to design evaluation metrics that work well across question and answer types; and (3) How to give model insights beyond a simple performance ranking.

Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis

ellenzhuwang/implicit_vkood NeurIPS 2023

Deep network models are often purely inductive during both training and inference on unseen data.

CoLLaVO: Crayon Large Language and Vision mOdel

ByungKwanLee/CoLLaVO 17 Feb 2024

Our findings reveal that the image understanding capabilities of current VLMs are strongly correlated with their zero-shot performance on vision language (VL) tasks.