Fine-Grained Image Classification

172 papers with code • 35 benchmarks • 36 datasets

Fine-Grained Image Classification is a task in computer vision where the goal is to classify images into subcategories within a larger category. For example, classifying different species of birds or different types of flowers. This task is considered to be fine-grained because it requires the model to distinguish between subtle differences in visual appearance and patterns, making it more challenging than regular image classification tasks.

( Image credit: Looking for the Devil in the Details )

Benchmarks

Add a Result

These leaderboards are used to track progress in Fine-Grained Image Classification

Dataset	Best Model	Compare
Stanford Cars	CMAL-Net	See all
CUB-200-2011	HERBS	See all
FGVC Aircraft	SR-GNN	See all
Oxford 102 Flowers	VIT-L/16 (Background)	See all
CUB-200-2011	HERBS	See all
NABirds	MetaFormer (MetaFormer-2,384)	See all
Oxford-IIIT Pet Dataset	OmniVec	See all
Stanford Dogs	SR-GNN	See all
Food-101	CAP	See all
Caltech-101	VIT-L/16	See all
Oxford-IIIT Pets	EffNet-L2 (SAM)	See all
CompCars	ResNet101-swp	See all
Birdsnap	EffNet-L2 (SAM)	See all
Bird-225	WideResNet-101 (Spinal FC)	See all
SUN397	µ2Net (ViT-L/16)	See all
10 Monkey Species	Inception-v3 (Spinal FC)	See all
Fruits-360	ResNeXt-101	See all
FoodX-251	CSWin-L	See all
Imbalanced CUB-200-2011	PC-Softmax	See all
SOP	Assemble-ResNet-FGVC-50	See all
Con-Text	PHOC descriptor + Fisher Vector Encoding	See all
Bottles	PHOC descriptor + Fisher Vector Encoding	See all
MNIST	Vanilla FC layer only	See all
EMNIST-Digits	VGG-5	See all
EMNIST-Letters	VGG-5	See all
QMNIST	VGG-5	See all
Kuzushiji-MNIST	VGG-5	See all
STL-10	Pre trained wide-resnet-101	See all
BoxCars116K	ResNet152 + COOC	See all
CarFlag-1532	ResNet101-swp	See all
CarFlag-563	ResNet101-swp	See all
iNaturalist	TASN	See all
FGVC-Aircraft	EnGraf-Net101 (G=4, H=1)	See all
Herbarium 2021 Half–Earth	Conviformer-B	See all
Herbarium 2022	Conviformer-B	See all

Show all 35 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Fine-Grained Image Classification models and implementations

rwightman/pytorch-image-models

7 papers

29,713

open-mmlab/mmclassification

4 papers

3,154

osmr/imgclsmob

4 papers

2,917

Westlake-AI/openmixup

4 papers

568

See all 25 libraries.

Datasets

Subtasks

Displaced People Recognition

Latest papers

Most implemented Social Latest No code

Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning

cl-premonition/premonition • 12 Mar 2024

We show here that the combination of a large language model and an image generation model can similarly provide useful premonitions as to how a continual learning challenge might develop over time.

12 Mar 2024

Paper
Code

Invariant Test-Time Adaptation for Vision-Language Model Generalization

mahuanaaa/intta • • 1 Mar 2024

Vision-language foundation models have exhibited remarkable success across a multitude of downstream tasks due to their scalability on extensive image-text paired datasets.

01 Mar 2024

Paper
Code

Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions

cvl-umass/adaptclipzs • • 4 Jan 2024

By prompting LLMs in various ways, we generate descriptions that capture visual appearance, habitat, and geographic regions and pair them with existing attributes such as the taxonomic structure of the categories.

04 Jan 2024

Paper
Code

Roll With the Punches: Expansion and Shrinkage of Soft Label Selection for Semi-supervised Fine-Grained Learning

njuyued/soc4ss-fgvc • • 19 Dec 2023

While semi-supervised learning (SSL) has yielded promising results, the more realistic SSL scenario remains to be explored, in which the unlabeled data exhibits extremely high recognition difficulty, e. g., fine-grained visual classification in the context of SSL (SS-FGVC).

19 Dec 2023

Paper
Code

Good Questions Help Zero-Shot Image Reasoning

kai-wen-yang/qvix • • 4 Dec 2023

QVix enables a wider exploration of visual scenes, improving the LVLMs' reasoning accuracy and depth in tasks such as visual question answering and visual entailment.

04 Dec 2023

Paper
Code

GIFT: Generative Interpretable Fine-Tuning Transformers

savadikarc/gift • 1 Dec 2023

For the latter, in contrast to the prior art that directly introduce new model parameters (often in low-rank approximation form) to be learned in fine-tuning with downstream data, we propose a method for learning to generate the fine-tuning parameters.

01 Dec 2023

Paper
Code

Meta Co-Training: Two Views are Better than One

jayrothenberger/meta-co-training • • 29 Nov 2023

We show that in the common case when independent views are not available we can construct such views inexpensively using pre-trained models.

29 Nov 2023

Paper
Code

A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis

imageomics/intr • • 7 Nov 2023

Unlike mainstream classifiers that wait until the last fully-connected layer to incorporate class information to make predictions, we investigate a proactive approach, asking each class to search for itself in an image.

07 Nov 2023

Paper
Code

BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping

mvrl/birdsat • • 29 Oct 2023

We propose a metadata-aware self-supervised learning~(SSL)~framework useful for fine-grained classification and ecological mapping of bird species around the world.

29 Oct 2023

Paper
Code

Gramian Attention Heads are Strong yet Efficient Vision Learners

lab-lvm/imagenet-models • • ICCV 2023

We introduce a novel architecture design that enhances expressiveness by incorporating multiple head classifiers (\ie, classification heads) instead of relying on channel expansion or additional building blocks.

25 Oct 2023

Paper
Code

Fine-Grained Image Classification

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result