Browse SoTA > Computer Vision > Visual Question Answering

Visual Question Answering

198 papers with code · Computer Vision

Visual Question Answering is a semantic task that aims to answer questions based on an image.

Source: Robust Explanations for Visual Question Answering

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Greatest papers with code

Learning to Reason: End-to-End Module Networks for Visual Question Answering

ICCV 2017 tensorflow/models

Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems.

VISUAL QUESTION ANSWERING

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

IJCNLP 2019 huggingface/transformers

In LXMERT, we build a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language encoder, and a cross-modality encoder.

LANGUAGE MODELLING QUESTION ANSWERING VISUAL QUESTION ANSWERING VISUAL REASONING

ParlAI: A Dialog Research Software Platform

EMNLP 2017 facebookresearch/ParlAI

We introduce ParlAI (pronounced "par-lay"), an open-source software platform for dialog research implemented in Python, available at http://parl. ai.

VISUAL QUESTION ANSWERING

Hadamard Product for Low-rank Bilinear Pooling

14 Oct 2016facebookresearch/ParlAI

Bilinear models provide rich representations compared with linear models.

VISUAL QUESTION ANSWERING

Towards VQA Models That Can Read

CVPR 2019 facebookresearch/pythia

We show that LoRRA outperforms existing state-of-the-art VQA models on our TextVQA dataset.

VISUAL QUESTION ANSWERING

Pythia v0.1: the Winning Entry to the VQA Challenge 2018

26 Jul 2018facebookresearch/mmf

We demonstrate that by making subtle but important changes to the model architecture and the learning rate schedule, fine-tuning image features, and adding data augmentation, we can significantly improve the performance of the up-down model on VQA v2. 0 dataset -- from 65. 67% to 70. 22%.

DATA AUGMENTATION VISUAL QUESTION ANSWERING

Bilinear Attention Networks

NeurIPS 2018 facebookresearch/mmf

In this paper, we propose bilinear attention networks (BAN) that find bilinear attention distributions to utilize given vision-language information seamlessly.

VISUAL QUESTION ANSWERING

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

CVPR 2018 facebookresearch/pythia

Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning.

IMAGE CAPTIONING VISUAL QUESTION ANSWERING