TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Multi-Label Classification	MS-COCO	MlTr-XL(ImageNet-21K pretraining, resolution 384)	mAP	90.0	# 12
Multi-Label Classification	MS-COCO	MlTr-L(ImageNet-21K pretraining, resolution 384)	mAP	88.5	# 14

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mltr-multi-label-classification-with/multi-label-classification-on-ms-coco)](https://paperswithcode.com/sota/multi-label-classification-on-ms-coco?p=mltr-multi-label-classification-with)`

MlTr: Multi-label Classification with Transformer

11 Jun 2021 · Xing Cheng, Hezheng Lin, Xiangyu Wu, Fan Yang, Dong Shen, Zhongyuan Wang, Nian Shi, Honglin Liu ·

The task of multi-label image classification is to recognize all the object labels presented in an image. Though advancing for years, small objects, similar objects and objects with high conditional probability are still the main bottlenecks of previous convolutional neural network(CNN) based models, limited by convolutional kernels' representational capacity. Recent vision transformer networks utilize the self-attention mechanism to extract the feature of pixel granularity, which expresses richer local semantic information, while is insufficient for mining global spatial dependence. In this paper, we point out the three crucial problems that CNN-based methods encounter and explore the possibility of conducting specific transformer modules to settle them. We put forward a Multi-label Transformer architecture(MlTr) constructed with windows partitioning, in-window pixel attention, cross-window attention, particularly improving the performance of multi-label image classification tasks. The proposed MlTr shows state-of-the-art results on various prevalent multi-label datasets such as MS-COCO, Pascal-VOC, and NUS-WIDE with 88.5%, 95.8%, and 65.5% respectively. The code will be available soon at https://github.com/starmemda/MlTr/

PDF Abstract

Code

Add Remove Mark official

starmemda/MlTr official

Tasks

Add Remove

Classification

Image Classification

Multi-Label Classification

Multi-Label Image Classification

Datasets

MS COCO

Results from the Paper

Edit

Ranked #12 on Multi-Label Classification on MS-COCO

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Multi-Label Classification	MS-COCO	MlTr-XL(ImageNet-21K pretraining, resolution 384)	mAP	90.0	# 12		Compare
Multi-Label Classification	MS-COCO	MlTr-L(ImageNet-21K pretraining, resolution 384)	mAP	88.5	# 14		Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer • Vision Transformer

Edit Social Preview

MlTr: Multi-label Classification with Transformer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove