Fine-Grained Image Classification
171 papers with code • 35 benchmarks • 36 datasets
Fine-Grained Image Classification is a task in computer vision where the goal is to classify images into subcategories within a larger category. For example, classifying different species of birds or different types of flowers. This task is considered to be fine-grained because it requires the model to distinguish between subtle differences in visual appearance and patterns, making it more challenging than regular image classification tasks.
( Image credit: Looking for the Devil in the Details )
Libraries
Use these libraries to find Fine-Grained Image Classification models and implementationsDatasets
Latest papers with no code
Context-Semantic Quality Awareness Network for Fine-Grained Visual Categorization
To tackle this challenge, we propose a weakly supervised Context-Semantic Quality Awareness Network (CSQA-Net) for FGVC.
Deep Neural Network Models Trained With A Fixed Random Classifier Transfer Better Across Domains
Inspired by NC properties, we explore in this paper the transferability of DNN models trained with their last layer weight fixed according to ETF.
Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions
By prompting LLMs in various ways, we generate descriptions that capture visual appearance, habitat, and geographic regions and pair them with existing attributes such as the taxonomic structure of the categories.
Masked Image Modeling via Dynamic Token Morphing
Masked Image Modeling (MIM) arises as a promising option for Vision Transformers among various self-supervised learning (SSL) methods.
Human in-the-Loop Estimation of Cluster Count in Datasets via Similarity-Driven Nested Importance Sampling
Human feedback on the pairwise similarity can be used to improve the clustering, but existing approaches do not guarantee accurate count estimates.
OmniVec: Learning robust representations with cross modal sharing
We demonstrate empirically that, using a joint network to train across modalities leads to meaningful information sharing and this allows us to achieve state-of-the-art results on most of the benchmarks.
Dining on Details: LLM-Guided Expert Networks for Fine-Grained Food Recognition
Trained through an end-to-end multi-task learning process, this method enhances performance in the fine-grained food recognition task, showing exceptional prowess with highly similar classes.
Longer-range Contextualized Masked Autoencoder
However, as the encoder is trained with partial pixels, the MIM pre-training can suffer from a low capability of understanding long-range dependency.
Delving into Multimodal Prompting for Fine-grained Visual Classification
In this paper, we aim to fully exploit the capabilities of cross-modal description to tackle FGVC tasks and propose a novel multimodal prompting solution, denoted as MP-FGVC, based on the contrastive language-image pertaining (CLIP) model.
Deep Neural Networks Fused with Textures for Image Classification
Fine-grained image classification (FGIC) is a challenging task in computer vision for due to small visual differences among inter-subcategories, but, large intra-class variations.