Visual Entailment

27 papers with code • 3 benchmarks • 3 datasets

Visual Entailment (VE) - is a task consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal is to predict whether the image semantically entails the text.

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Entailment

Dataset	Best Model	Compare
SNLI-VE val	OFA	See all
SNLI-VE test	OFA	See all
e-SNLI-VE	OFA-X	See all

Libraries

Use these libraries to find Visual Entailment models and implementations

ofa-sys/ofa

2 papers

2,327

Datasets

Latest papers

Most implemented Social Latest No code

Prompt Tuning for Generative Multimodal Pretrained Models

ofa-sys/ofa • • 4 Aug 2022

Prompt tuning has become a new paradigm for model tuning and it has demonstrated success in natural language pretraining and even vision pretraining.

2,327

04 Aug 2022

Paper
Code

Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language Explanations

HITsz-TMG/ExplainableVisualEntailment • • 23 Jul 2022

CSI), a relation inferrer, and a Lexical Constraint-aware Generator (arr.

23 Jul 2022

Paper
Code

MixGen: A New Multi-Modal Data Augmentation

amazon-research/mix-generation • • 16 Jun 2022

Data augmentation is a necessity to enhance data efficiency in deep learning.

106

16 Jun 2022

Paper
Code

CoCa: Contrastive Captioners are Image-Text Foundation Models

mlfoundations/open_clip • • 4 May 2022

We apply a contrastive loss between unimodal image and text embeddings, in addition to a captioning loss on the multimodal decoder outputs which predicts text tokens autoregressively.

8,494

04 May 2022

Paper
Code

Visual Spatial Reasoning

cambridgeltl/visual-spatial-reasoning • • 30 Apr 2022

Spatial relations are a basic part of human cognition.

30 Apr 2022

Paper
Code

Fine-Grained Visual Entailment

skrighyz/fgve • • 29 Mar 2022

In this paper, we propose an extension of this task, where the goal is to predict the logical relationship of fine-grained knowledge elements within a piece of text to an image.

29 Mar 2022

Paper
Code

NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks

fawazsammani/nlxgpt • • CVPR 2022

Current NLE models explain the decision-making process of a vision or vision-language model (a. k. a., task model), e. g., a VQA model, via a language model (a. k. a., explanation model), e. g., GPT.

09 Mar 2022

Paper
Code

OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

modelscope/modelscope • • 7 Feb 2022

In this work, we pursue a unified paradigm for multimodal pretraining to break the scaffolds of complex task/modality-specific customization.

6,091

07 Feb 2022

Paper
Code

Distilled Dual-Encoder Model for Vision-Language Understanding

kugwzk/distilled-dualencoder • • 16 Dec 2021

We propose a cross-modal attention distillation framework to train a dual-encoder model for vision-language understanding tasks, such as visual reasoning and visual question answering.

16 Dec 2021

Paper
Code

Check It Again:Progressive Visual Question Answering via Visual Entailment

PhoebusSi/SAR • • ACL 2021

Besides, they only explore the interaction between image and question, ignoring the semantics of candidate answers.

01 Aug 2021

Paper
Code

Visual Entailment

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result