TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Fine-Grained Image Classification	CUB-200-2011	DCAL	Accuracy	92.0%	# 6
Fine-Grained Image Classification	FGVC Aircraft	DCAL	Accuracy	93.3%	# 23
Fine-Grained Image Classification	Stanford Cars	DCAL	Accuracy	95.3%	# 13

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dual-cross-attention-learning-for-fine/fine-grained-image-classification-on-cub-200)](https://paperswithcode.com/sota/fine-grained-image-classification-on-cub-200?p=dual-cross-attention-learning-for-fine)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dual-cross-attention-learning-for-fine/fine-grained-image-classification-on-stanford)](https://paperswithcode.com/sota/fine-grained-image-classification-on-stanford?p=dual-cross-attention-learning-for-fine)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dual-cross-attention-learning-for-fine/fine-grained-image-classification-on-fgvc)](https://paperswithcode.com/sota/fine-grained-image-classification-on-fgvc?p=dual-cross-attention-learning-for-fine)`

Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification

CVPR 2022 · Haowei Zhu, Wenjing Ke, Dong Li, Ji Liu, Lu Tian, Yi Shan ·

Recently, self-attention mechanisms have shown impressive performance in various NLP and CV tasks, which can help capture sequential characteristics and derive global information. In this work, we explore how to extend self-attention modules to better learn subtle feature embeddings for recognizing fine-grained objects, e.g., different bird species or person identities. To this end, we propose a dual cross-attention learning (DCAL) algorithm to coordinate with self-attention learning. First, we propose global-local cross-attention (GLCA) to enhance the interactions between global images and local high-response regions, which can help reinforce the spatial-wise discriminative clues for recognition. Second, we propose pair-wise cross-attention (PWCA) to establish the interactions between image pairs. PWCA can regularize the attention learning of an image by treating another image as distractor and will be removed during inference. We observe that DCAL can reduce misleading attentions and diffuse the attention response to discover more complementary parts for recognition. We conduct extensive evaluations on fine-grained visual categorization and object re-identification. Experiments demonstrate that DCAL performs on par with state-of-the-art methods and consistently improves multiple self-attention baselines, e.g., surpassing DeiT-Tiny and ViT-Base by 2.8% and 2.4% mAP on MSMT17, respectively.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Fine-Grained Image Classification

Fine-Grained Visual Categorization

Datasets

ImageNet

CUB-200-2011

Market-1501

Stanford Cars

FGVC-Aircraft

Results from the Paper

Edit

Ranked #6 on Fine-Grained Image Classification on CUB-200-2011 (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Fine-Grained Image Classification	CUB-200-2011	DCAL	Accuracy	92.0%	# 6	Compare
Fine-Grained Image Classification	FGVC Aircraft	DCAL	Accuracy	93.3%	# 23	Compare
Fine-Grained Image Classification	Stanford Cars	DCAL	Accuracy	95.3%	# 13	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove