no code implementations • 15 Jun 2023 • Ishaan Singh Rawal, Shantanu Jaiswal, Basura Fernando, Cheston Tan
We evaluate models on CLAVI and find that all models achieve high performance on multimodal shortcut instances, but most of them have poor performance on the counterfactual instances that necessitate joint multimodal understanding.