Fine-Grained Image Classification
173 papers with code • 35 benchmarks • 36 datasets
Fine-Grained Image Classification is a task in computer vision where the goal is to classify images into subcategories within a larger category. For example, classifying different species of birds or different types of flowers. This task is considered to be fine-grained because it requires the model to distinguish between subtle differences in visual appearance and patterns, making it more challenging than regular image classification tasks.
( Image credit: Looking for the Devil in the Details )
Libraries
Use these libraries to find Fine-Grained Image Classification models and implementationsDatasets
Latest papers
DINOv2: Learning Robust Visual Features without Supervision
The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision.
Your Diffusion Model is Secretly a Zero-Shot Classifier
Our generative approach to classification, which we call Diffusion Classifier, attains strong results on a variety of benchmarks and outperforms alternative methods of extracting knowledge from diffusion models.
Take 5: Interpretable Image Classification with a Handful of Features
We argue that a human can only understand the decision of a machine learning model, if the features are interpretable and only very few of them are used for a single decision.
Learn from Each Other to Classify Better: Cross-layer Mutual Attention Learning for Fine-grained Visual Classification
Specifically, this work views the shallow to deep layers of CNNs as “experts” knowledgeable about different perspectives.
Cascading Hierarchical Networks with Multi-task Balanced Loss for Fine-grained hashing
To improve the retrieval accuracy of fine-grained hashing, we propose a cascaded network to learn compact and highly semantic hash codes, and introduce an attention-guided data augmentation method.
Fine-grained Visual Classification with High-temperature Refinement and Background Suppression
The high-temperature refinement module allows the model to learn the appropriate feature scales by refining the features map at different scales and improving the learning of diverse features.
Fine-Grained Visual Classification via Internal Ensemble Learning Transformer
The proposed IELT involves three main modules: multi-head voting (MHV) module, cross-layer refinement (CLR) module, and dynamic selection (DS) module.
LiT Tuned Models for Efficient Species Detection
Recent advances in training vision-language models have demonstrated unprecedented robustness and transfer learning effectiveness; however, standard computer vision datasets are image-only, and therefore not well adapted to such training methods.
The CropAndWeed Dataset: A Multi-Modal Learning Approach for Efficient Crop and Weed Manipulation
Precision Agriculture and especially the application of automated weed intervention represents an increasingly essential research area, as sustainability and efficiency considerations are becoming more and more relevant.
Multi-View Active Fine-Grained Visual Recognition
Despite the remarkable progress of Fine-grained visual classification (FGVC) with years of history, it is still limited to recognizing 2 images.