TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Medical Visual Question Answering	PathVQA	MUMC	Free-form Accuracy	39.0	# 2
Medical Visual Question Answering	PathVQA	MUMC	Yes/No Accuracy	90.4	# 1
Medical Visual Question Answering	PathVQA	MUMC	Overall Accuracy	65.1	# 1
Medical Visual Question Answering	SLAKE-English	MUMC	Overall Accuracy	84.9	# 3
Medical Visual Question Answering	VQA-RAD	MUMC	Close-ended Accuracy	84.2	# 3
Medical Visual Question Answering	VQA-RAD	MUMC	Open-ended Accuracy	71.5	# 3
Medical Visual Question Answering	VQA-RAD	MUMC	Overall Accuracy	79.2	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/masked-vision-and-language-pre-training-with/medical-visual-question-answering-on-pathvqa)](https://paperswithcode.com/sota/medical-visual-question-answering-on-pathvqa?p=masked-vision-and-language-pre-training-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/masked-vision-and-language-pre-training-with/medical-visual-question-answering-on-vqa)](https://paperswithcode.com/sota/medical-visual-question-answering-on-vqa?p=masked-vision-and-language-pre-training-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/masked-vision-and-language-pre-training-with/medical-visual-question-answering-on-vqa-rad)](https://paperswithcode.com/sota/medical-visual-question-answering-on-vqa-rad?p=masked-vision-and-language-pre-training-with)`

Masked Vision and Language Pre-training with Unimodal and Multimodal Contrastive Losses for Medical Visual Question Answering

11 Jul 2023 · Pengfei Li, Gang Liu, Jinlong He, Zixu Zhao, Shenjun Zhong ·

Medical visual question answering (VQA) is a challenging task that requires answering clinical questions of a given medical image, by taking consider of both visual and language information. However, due to the small scale of training data for medical VQA, pre-training fine-tuning paradigms have been a commonly used solution to improve model generalization performance. In this paper, we present a novel self-supervised approach that learns unimodal and multimodal feature representations of input images and text using medical image caption datasets, by leveraging both unimodal and multimodal contrastive losses, along with masked language modeling and image text matching as pretraining objectives. The pre-trained model is then transferred to downstream medical VQA tasks. The proposed approach achieves state-of-the-art (SOTA) performance on three publicly available medical VQA datasets with significant accuracy improvements of 2.2%, 14.7%, and 1.7% respectively. Besides, we conduct a comprehensive analysis to validate the effectiveness of different components of the approach and study different pre-training settings. Our codes and models are available at https://github.com/pengfeiliHEU/MUMC.

PDF Abstract

Code

Add Remove Mark official

pengfeiliheu/mumc official

Tasks

Add Remove

Medical Visual Question Answering

Datasets

VQA-RAD

SLAKE

PathVQA

SLAKE-English

Results from the Paper

Add Remove

Ranked #1 on Medical Visual Question Answering on PathVQA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Medical Visual Question Answering	PathVQA	MUMC	Free-form Accuracy	39.0	# 2	Compare
			Yes/No Accuracy	90.4	# 1	Compare
			Overall Accuracy	65.1	# 1	Compare
Medical Visual Question Answering	SLAKE-English	MUMC	Overall Accuracy	84.9	# 3	Compare
Medical Visual Question Answering	VQA-RAD	MUMC	Close-ended Accuracy	84.2	# 3	Compare
			Open-ended Accuracy	71.5	# 3	Compare
			Overall Accuracy	79.2	# 3	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Masked Vision and Language Pre-training with Unimodal and Multimodal Contrastive Losses for Medical Visual Question Answering

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove