Context-Semantic Quality Awareness Network for Fine-Grained Visual Categorization

15 Mar 2024  ·  Qin Xu, Sitong Li, Jiahui Wang, Bo Jiang, Jinhui Tang ·

Exploring and mining subtle yet distinctive features between sub-categories with similar appearances is crucial for fine-grained visual categorization (FGVC). However, less effort has been devoted to assessing the quality of extracted visual representations. Intuitively, the network may struggle to capture discriminative features from low-quality samples, which leads to a significant decline in FGVC performance. To tackle this challenge, we propose a weakly supervised Context-Semantic Quality Awareness Network (CSQA-Net) for FGVC. In this network, to model the spatial contextual relationship between rich part descriptors and global semantics for capturing more discriminative details within the object, we design a novel multi-part and multi-scale cross-attention (MPMSCA) module. Before feeding to the MPMSCA module, the part navigator is developed to address the scale confusion problems and accurately identify the local distinctive regions. Furthermore, we propose a generic multi-level semantic quality evaluation module (MLSQE) to progressively supervise and enhance hierarchical semantics from different levels of the backbone network. Finally, context-aware features from MPMSCA and semantically enhanced features from MLSQE are fed into the corresponding quality probing classifiers to evaluate their quality in real-time, thus boosting the discriminability of feature representations. Comprehensive experiments on four popular and highly competitive FGVC datasets demonstrate the superiority of the proposed CSQA-Net in comparison with the state-of-the-art methods.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Fine-Grained Image Classification CUB-200-2011 CSQA-Net Accuracy 92.6% # 4
Fine-Grained Image Classification FGVC Aircraft CSQA-Net Accuracy 94.7% # 4
Fine-Grained Image Classification NABirds CSQA-Net Accuracy 92.3% # 4
Fine-Grained Image Classification Stanford Cars CSQA-Net Accuracy 95.6% # 8

Methods


No methods listed for this paper. Add relevant methods here