Fine-Grained Image Classification

172 papers with code • 35 benchmarks • 36 datasets

Fine-Grained Image Classification is a task in computer vision where the goal is to classify images into subcategories within a larger category. For example, classifying different species of birds or different types of flowers. This task is considered to be fine-grained because it requires the model to distinguish between subtle differences in visual appearance and patterns, making it more challenging than regular image classification tasks.

( Image credit: Looking for the Devil in the Details )

Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning

cl-premonition/premonition 12 Mar 2024

We show here that the combination of a large language model and an image generation model can similarly provide useful premonitions as to how a continual learning challenge might develop over time.

0
12 Mar 2024

Invariant Test-Time Adaptation for Vision-Language Model Generalization

mahuanaaa/intta 1 Mar 2024

Vision-language foundation models have exhibited remarkable success across a multitude of downstream tasks due to their scalability on extensive image-text paired datasets.

0
01 Mar 2024

Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions

cvl-umass/adaptclipzs 4 Jan 2024

By prompting LLMs in various ways, we generate descriptions that capture visual appearance, habitat, and geographic regions and pair them with existing attributes such as the taxonomic structure of the categories.

2
04 Jan 2024

Roll With the Punches: Expansion and Shrinkage of Soft Label Selection for Semi-supervised Fine-Grained Learning

njuyued/soc4ss-fgvc 19 Dec 2023

While semi-supervised learning (SSL) has yielded promising results, the more realistic SSL scenario remains to be explored, in which the unlabeled data exhibits extremely high recognition difficulty, e. g., fine-grained visual classification in the context of SSL (SS-FGVC).

4
19 Dec 2023

Good Questions Help Zero-Shot Image Reasoning

kai-wen-yang/qvix 4 Dec 2023

QVix enables a wider exploration of visual scenes, improving the LVLMs' reasoning accuracy and depth in tasks such as visual question answering and visual entailment.

10
04 Dec 2023

GIFT: Generative Interpretable Fine-Tuning Transformers

savadikarc/gift 1 Dec 2023

For the latter, in contrast to the prior art that directly introduce new model parameters (often in low-rank approximation form) to be learned in fine-tuning with downstream data, we propose a method for learning to generate the fine-tuning parameters.

13
01 Dec 2023

Meta Co-Training: Two Views are Better than One

jayrothenberger/meta-co-training 29 Nov 2023

We show that in the common case when independent views are not available we can construct such views inexpensively using pre-trained models.

10
29 Nov 2023

A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis

imageomics/intr 7 Nov 2023

Unlike mainstream classifiers that wait until the last fully-connected layer to incorporate class information to make predictions, we investigate a proactive approach, asking each class to search for itself in an image.

24
07 Nov 2023

BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping

mvrl/birdsat 29 Oct 2023

We propose a metadata-aware self-supervised learning~(SSL)~framework useful for fine-grained classification and ecological mapping of bird species around the world.

13
29 Oct 2023

Gramian Attention Heads are Strong yet Efficient Vision Learners

lab-lvm/imagenet-models ICCV 2023

We introduce a novel architecture design that enhances expressiveness by incorporating multiple head classifiers (\ie, classification heads) instead of relying on channel expansion or additional building blocks.

18
25 Oct 2023