Visual Commonsense Reasoning

29 papers with code • 7 benchmarks • 7 datasets

Image source: Visual Commonsense Reasoning

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Commonsense Reasoning

Dataset	Best Model	Compare
GD-VCR	VisualBERT	See all
VCR (Q-AR) test	PEVL	See all
VCR (QA-R) test	PEVL	See all
VCR (Q-A) test	PEVL	See all
VCR (Q-A) dev	PEVL	See all
VCR (QA-R) dev	PEVL	See all
VCR (Q-AR) dev	PEVL	See all

Datasets

Latest papers

Most implemented Social Latest No code

Towards artificial general intelligence via a multimodal foundation model

neilfei/brivl-nmi • • 27 Oct 2021

To overcome this limitation and take a solid step towards artificial general intelligence (AGI), we develop a foundation model pre-trained with huge multimodal data, which can be quickly adapted for various downstream cognitive tasks.

27 Oct 2021

Paper
Code

Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning

wadeyin9712/gd-vcr • • EMNLP 2021

Commonsense is defined as the knowledge that is shared by everyone.

14 Sep 2021

Paper
Code

X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics

yehli/xmodaler • • 18 Aug 2021

Nevertheless, there has not been an open-source codebase in support of training and deploying numerous neural network models for cross-modal analytics in a unified and modular fashion.

1,005

18 Aug 2021

Paper
Code

Interpretable Visual Understanding with Cognitive Attention Network

tanjatang/CAN • • 6 Aug 2021

While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge.

06 Aug 2021

Paper
Code

Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory

tanjatang/DMVCR • • 4 Jul 2021

Moreover, the proposed model provides intuitive interpretation into visual commonsense reasoning.

04 Jul 2021

Paper
Code

MERLOT: Multimodal Neural Script Knowledge Models

rowanz/merlot • • NeurIPS 2021

As humans, we understand events in the visual world contextually, performing multimodal reasoning across time to make inferences about the past, present, and future.

221

04 Jun 2021

Paper
Code

Unifying Vision-and-Language Tasks via Text Generation

j-min/VL-T5 • • 4 Feb 2021

On 7 popular vision-and-language benchmarks, including visual question answering, referring expression comprehension, visual commonsense reasoning, most of which have been previously modeled as discriminative tasks, our generative approach (with a single unified architecture) reaches comparable performance to recent task-specific state-of-the-art vision-and-language models.

354

04 Feb 2021

Paper
Code

Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs

allenai/visual-reasoning-rationalization • • Findings of the Association for Computational Linguistics 2020

Natural language rationales could provide intuitive, higher-level explanations that are easily understandable by humans, complementing the more broadly studied lower-level explanations based on gradients or attention weights.

15 Oct 2020

Paper
Code

Large-Scale Adversarial Training for Vision-and-Language Representation Learning

zhegan27/VILLA • • NeurIPS 2020

We present VILLA, the first known effort on large-scale adversarial training for vision-and-language (V+L) representation learning.

118

11 Jun 2020

Paper
Code

TAB-VCR: Tags and Attributes based VCR Baselines

Deanplayerljx/tab-vcr • • NeurIPS 2019

Despite impressive recent progress that has been reported on tasks that necessitate reasoning, such as visual question answering and visual dialog, models often exploit biases in datasets.

01 Dec 2019

Paper
Code

Visual Commonsense Reasoning

Benchmarks Add a Result

Datasets

Latest papers

Content

Benchmarks

Add a Result