Fine-Grained Image Recognition
33 papers with code • 4 benchmarks • 9 datasets
Datasets
Latest papers
Res-VMamba: Fine-Grained Food Category Visual Classification Using Selective State Space Models with Deep Residual Learning
Our findings elucidate that our proposed methodology establishes a new benchmark for SOTA performance in food recognition on the CNFOOD-241 dataset.
Hawkeye: A PyTorch-based Library for Fine-Grained Image Recognition with Deep Learning
However, the absence of a unified open-source software library covering various paradigms in FGIR poses a significant challenge for researchers and practitioners in the field.
Fine-grained Recognition with Learnable Semantic Data Augmentation
Since images belonging to the same meta-category usually share similar visual appearances, mining discriminative visual cues is the key to distinguishing fine-grained categories.
PaLI-X: On Scaling up a Multilingual Vision and Language Model
We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture.
Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Large-scale multi-modal pre-training models such as CLIP and PaLI exhibit strong generalization on various visual domains and tasks.
Taxonomy and evolution predicting using deep learning in images
Molecular and morphological characters, as important parts of biological taxonomy, are contradictory but need to be integrated.
Making Heads or Tails: Towards Semantically Consistent Visual Counterfactuals
A visual counterfactual explanation replaces image regions in a query image with regions from a distractor image such that the system's decision on the transformed image changes to the distractor class.
A Novel Plug-in Module for Fine-Grained Visual Classification
Visual classification can be divided into coarse-grained and fine-grained classification.
High-Order-Interaction for weakly supervised Fine-Grained Visual Categorization
Of those, methods based on bilinear pooling are one of the main categories for computing the interaction between deep features and have shown high effectiveness.
The Curious Layperson: Fine-Grained Image Recognition without Expert Labels
We then train a fine-grained textual similarity model that matches image descriptions with documents on a sentence-level basis.