The MMVP (Multimodal Visual Patterns) Benchmark focuses on identifying "CLIP-blind pairs" – images that appear similar to the CLIP model despite having clear visual differences. These patterns highlight the challenges these systems face in answering straightforward questions, often leading to incorrect responses and hallucinated explanations.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages