Fine-Grained Image Classification

173 papers with code • 35 benchmarks • 36 datasets

Fine-Grained Image Classification is a task in computer vision where the goal is to classify images into subcategories within a larger category. For example, classifying different species of birds or different types of flowers. This task is considered to be fine-grained because it requires the model to distinguish between subtle differences in visual appearance and patterns, making it more challenging than regular image classification tasks.

( Image credit: Looking for the Devil in the Details )

Latest papers with no code

Adaptive Fine-Grained Predicates Learning for Scene Graph Generation

no code yet • 11 Jul 2022

The performance of current Scene Graph Generation (SGG) models is severely hampered by hard-to-distinguish predicates, e. g., woman-on/standing on/walking on-beach.

Large Neural Networks Learning from Scratch with Very Few Data and without Explicit Regularization

no code yet • 18 May 2022

We show that very large Convolutional Neural Networks with millions of weights do learn with only a handful of training samples and without image augmentation, explicit regularization or pretraining.

Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification

no code yet • CVPR 2022

First, we propose global-local cross-attention (GLCA) to enhance the interactions between global images and local high-response regions, which can help reinforce the spatial-wise discriminative clues for recognition.

Reinforcing Generated Images via Meta-learning for One-Shot Fine-Grained Visual Recognition

no code yet • 22 Apr 2022

One-shot fine-grained visual recognition often suffers from the problem of having few training examples for new fine-grained classes.

ViT-FOD: A Vision Transformer based Fine-grained Object Discriminator

no code yet • 24 Mar 2022

Recently, several Vision Transformer (ViT) based methods have been proposed for Fine-Grained Visual Classification (FGVC). These methods significantly surpass existing CNN-based ones, demonstrating the effectiveness of ViT in FGVC tasks. However, there are some limitations when applying ViT directly to FGVC. First, ViT needs to split images into patches and calculate the attention of every pair, which may result in heavy redundant calculation and unsatisfying performance when handling fine-grained images with complex background and small objects. Second, a standard ViT only utilizes the class token in the final layer for classification, which is not enough to extract comprehensive fine-grained information.

Automatic Fine-grained Glomerular Lesion Recognition in Kidney Pathology

no code yet • 11 Mar 2022

Recognition of glomeruli lesions is the key for diagnosis and treatment planning in kidney pathology; however, the coexisting glomerular structures such as mesangial regions exacerbate the difficulties of this task.

Bridge the Gap between Supervised and Unsupervised Learning for Fine-Grained Classification

no code yet • 1 Mar 2022

Unsupervised learning technology has caught up with or even surpassed supervised learning technology in general object classification (GOC) and person re-identification (re-ID).

Progressive Multi-stage Interactive Training in Mobile Network for Fine-grained Recognition

no code yet • 8 Dec 2021

Finally, using the progressive training (P), the features extracted by the model in different stages can be fully utilized and fused with each other.

Improved Robustness of Vision Transformer via PreLayerNorm in Patch Embedding

no code yet • 16 Nov 2021

We compared the robustness of CNN and ViT by assuming various image corruptions that may appear in practical vision tasks.

A free lunch from ViT:Adaptive Attention Multi-scale Fusion Transformer for Fine-grained Visual Recognition

no code yet • 4 Oct 2021

Learning subtle representation about object parts plays a vital role in fine-grained visual recognition (FGVR) field.