TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Data Augmentation	ImageNet	DeiT-B (+MixPro)	Accuracy (%)	82.9	# 1
Data Augmentation	ImageNet	DeiT-S (+MixPro)	Accuracy (%)	81.3	# 3
Image Classification	ImageNet	XCiT-M (+MixPro)	Top 1 Accuracy	84.1%	# 325
Image Classification	ImageNet	PVT-M (+MixPro)	Top 1 Accuracy	82.7%	# 465
Image Classification	ImageNet	CA-Swin-S (+MixPro)	Top 1 Accuracy	83.7%	# 365
Image Classification	ImageNet	DeiT-B (+MixPro)	Top 1 Accuracy	82.9%	# 445
Image Classification	ImageNet	CaiT-XXS (+MixPro)	Top 1 Accuracy	80.6%	# 635
Image Classification	ImageNet	CA-Swin-T (+MixPro)	Top 1 Accuracy	82.8%	# 453
Image Classification	ImageNet	PVT-S (+MixPro)	Top 1 Accuracy	81.2%	# 601
Image Classification	ImageNet	PVT-T (+MixPro)	Top 1 Accuracy	76.7%	# 832
Data Augmentation	ImageNet	DeiT-T (+MixPro)	Accuracy (%)	73.8	# 17
Image Classification	ImageNet	DeiT-T (+MixPro)	Top 1 Accuracy	73.8%	# 911

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mixpro-data-augmentation-with-maskmix-and/data-augmentation-on-imagenet)](https://paperswithcode.com/sota/data-augmentation-on-imagenet?p=mixpro-data-augmentation-with-maskmix-and)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mixpro-data-augmentation-with-maskmix-and/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=mixpro-data-augmentation-with-maskmix-and)`

MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer

24 Apr 2023 · QiHao Zhao, Yangyu Huang, Wei Hu, Fan Zhang, Jun Liu ·

The recently proposed data augmentation TransMix employs attention labels to help visual transformers (ViT) achieve better robustness and performance. However, TransMix is deficient in two aspects: 1) The image cropping method of TransMix may not be suitable for ViTs. 2) At the early stage of training, the model produces unreliable attention maps. TransMix uses unreliable attention maps to compute mixed attention labels that can affect the model. To address the aforementioned issues, we propose MaskMix and Progressive Attention Labeling (PAL) in image and label space, respectively. In detail, from the perspective of image space, we design MaskMix, which mixes two images based on a patch-like grid mask. In particular, the size of each mask patch is adjustable and is a multiple of the image patch size, which ensures each image patch comes from only one image and contains more global contents. From the perspective of label space, we design PAL, which utilizes a progressive factor to dynamically re-weight the attention weights of the mixed attention label. Finally, we combine MaskMix and Progressive Attention Labeling as our new data augmentation method, named MixPro. The experimental results show that our method can improve various ViT-based models at scales on ImageNet classification (73.8\% top-1 accuracy based on DeiT-T for 300 epochs). After being pre-trained with MixPro on ImageNet, the ViT-based models also demonstrate better transferability to semantic segmentation, object detection, and instance segmentation. Furthermore, compared to TransMix, MixPro also shows stronger robustness on several benchmarks. The code is available at https://github.com/fistyee/MixPro.

PDF Abstract

Code

Add Remove Mark official

fistyee/mixpro official

Tasks

Add Remove

Data Augmentation

Image Augmentation

Image ClassiData Augmentationfication

Image Classification

Image Cropping

Instance Segmentation

object-detection

Object Detection

Semantic Segmentation

Datasets

ImageNet ImageNet-1K

ImageNet-A

PASCAL Context

ImageNet-O

Results from the Paper

Edit

Ranked #1 on Data Augmentation on ImageNet

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Data Augmentation	ImageNet	DeiT-B (+MixPro)	Accuracy (%)	82.9	# 1	Compare
Data Augmentation	ImageNet	DeiT-S (+MixPro)	Accuracy (%)	81.3	# 3	Compare
Image Classification	ImageNet	XCiT-M (+MixPro)	Top 1 Accuracy	84.1%	# 325	Compare
Image Classification	ImageNet	PVT-M (+MixPro)	Top 1 Accuracy	82.7%	# 465	Compare
Image Classification	ImageNet	CA-Swin-S (+MixPro)	Top 1 Accuracy	83.7%	# 365	Compare
Image Classification	ImageNet	DeiT-B (+MixPro)	Top 1 Accuracy	82.9%	# 445	Compare
Image Classification	ImageNet	CaiT-XXS (+MixPro)	Top 1 Accuracy	80.6%	# 635	Compare
Image Classification	ImageNet	CA-Swin-T (+MixPro)	Top 1 Accuracy	82.8%	# 453	Compare
Image Classification	ImageNet	PVT-S (+MixPro)	Top 1 Accuracy	81.2%	# 601	Compare
Image Classification	ImageNet	PVT-T (+MixPro)	Top 1 Accuracy	76.7%	# 832	Compare
Data Augmentation	ImageNet	DeiT-T (+MixPro)	Accuracy (%)	73.8	# 17	Compare
Image Classification	ImageNet	DeiT-T (+MixPro)	Top 1 Accuracy	73.8%	# 911	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove