Fine-Grained Image Classification
173 papers with code • 35 benchmarks • 36 datasets
Fine-Grained Image Classification is a task in computer vision where the goal is to classify images into subcategories within a larger category. For example, classifying different species of birds or different types of flowers. This task is considered to be fine-grained because it requires the model to distinguish between subtle differences in visual appearance and patterns, making it more challenging than regular image classification tasks.
( Image credit: Looking for the Devil in the Details )
Libraries
Use these libraries to find Fine-Grained Image Classification models and implementationsDatasets
Latest papers with no code
Adaptive Fine-Grained Predicates Learning for Scene Graph Generation
The performance of current Scene Graph Generation (SGG) models is severely hampered by hard-to-distinguish predicates, e. g., woman-on/standing on/walking on-beach.
Large Neural Networks Learning from Scratch with Very Few Data and without Explicit Regularization
We show that very large Convolutional Neural Networks with millions of weights do learn with only a handful of training samples and without image augmentation, explicit regularization or pretraining.
Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification
First, we propose global-local cross-attention (GLCA) to enhance the interactions between global images and local high-response regions, which can help reinforce the spatial-wise discriminative clues for recognition.
Reinforcing Generated Images via Meta-learning for One-Shot Fine-Grained Visual Recognition
One-shot fine-grained visual recognition often suffers from the problem of having few training examples for new fine-grained classes.
ViT-FOD: A Vision Transformer based Fine-grained Object Discriminator
Recently, several Vision Transformer (ViT) based methods have been proposed for Fine-Grained Visual Classification (FGVC). These methods significantly surpass existing CNN-based ones, demonstrating the effectiveness of ViT in FGVC tasks. However, there are some limitations when applying ViT directly to FGVC. First, ViT needs to split images into patches and calculate the attention of every pair, which may result in heavy redundant calculation and unsatisfying performance when handling fine-grained images with complex background and small objects. Second, a standard ViT only utilizes the class token in the final layer for classification, which is not enough to extract comprehensive fine-grained information.
Automatic Fine-grained Glomerular Lesion Recognition in Kidney Pathology
Recognition of glomeruli lesions is the key for diagnosis and treatment planning in kidney pathology; however, the coexisting glomerular structures such as mesangial regions exacerbate the difficulties of this task.
Bridge the Gap between Supervised and Unsupervised Learning for Fine-Grained Classification
Unsupervised learning technology has caught up with or even surpassed supervised learning technology in general object classification (GOC) and person re-identification (re-ID).
Progressive Multi-stage Interactive Training in Mobile Network for Fine-grained Recognition
Finally, using the progressive training (P), the features extracted by the model in different stages can be fully utilized and fused with each other.
Improved Robustness of Vision Transformer via PreLayerNorm in Patch Embedding
We compared the robustness of CNN and ViT by assuming various image corruptions that may appear in practical vision tasks.
A free lunch from ViT:Adaptive Attention Multi-scale Fusion Transformer for Fine-grained Visual Recognition
Learning subtle representation about object parts plays a vital role in fine-grained visual recognition (FGVR) field.