Fine-Grained Image Classification

171 papers with code • 35 benchmarks • 36 datasets

Fine-Grained Image Classification is a task in computer vision where the goal is to classify images into subcategories within a larger category. For example, classifying different species of birds or different types of flowers. This task is considered to be fine-grained because it requires the model to distinguish between subtle differences in visual appearance and patterns, making it more challenging than regular image classification tasks.

( Image credit: Looking for the Devil in the Details )

Latest papers with no code

Context-Semantic Quality Awareness Network for Fine-Grained Visual Categorization

no code yet • 15 Mar 2024

To tackle this challenge, we propose a weakly supervised Context-Semantic Quality Awareness Network (CSQA-Net) for FGVC.

Deep Neural Network Models Trained With A Fixed Random Classifier Transfer Better Across Domains

no code yet • 28 Feb 2024

Inspired by NC properties, we explore in this paper the transferability of DNN models trained with their last layer weight fixed according to ETF.

Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions

no code yet • 4 Jan 2024

By prompting LLMs in various ways, we generate descriptions that capture visual appearance, habitat, and geographic regions and pair them with existing attributes such as the taxonomic structure of the categories.

Masked Image Modeling via Dynamic Token Morphing

no code yet • 30 Dec 2023

Masked Image Modeling (MIM) arises as a promising option for Vision Transformers among various self-supervised learning (SSL) methods.

Human in-the-Loop Estimation of Cluster Count in Datasets via Similarity-Driven Nested Importance Sampling

no code yet • 8 Dec 2023

Human feedback on the pairwise similarity can be used to improve the clustering, but existing approaches do not guarantee accurate count estimates.

OmniVec: Learning robust representations with cross modal sharing

no code yet • 7 Nov 2023

We demonstrate empirically that, using a joint network to train across modalities leads to meaningful information sharing and this allows us to achieve state-of-the-art results on most of the benchmarks.

Dining on Details: LLM-Guided Expert Networks for Fine-Grained Food Recognition

no code yet • MADiMa Workshop in ACM Multimedia 2023

Trained through an end-to-end multi-task learning process, this method enhances performance in the fine-grained food recognition task, showing exceptional prowess with highly similar classes.

Longer-range Contextualized Masked Autoencoder

no code yet • 20 Oct 2023

However, as the encoder is trained with partial pixels, the MIM pre-training can suffer from a low capability of understanding long-range dependency.

Delving into Multimodal Prompting for Fine-grained Visual Classification

no code yet • 16 Sep 2023

In this paper, we aim to fully exploit the capabilities of cross-modal description to tackle FGVC tasks and propose a novel multimodal prompting solution, denoted as MP-FGVC, based on the contrastive language-image pertaining (CLIP) model.

Deep Neural Networks Fused with Textures for Image Classification

no code yet • 3 Aug 2023

Fine-grained image classification (FGIC) is a challenging task in computer vision for due to small visual differences among inter-subcategories, but, large intra-class variations.