High-Order-Interaction for weakly supervised Fine-Grained Visual Categorization

Fine-Grained Visual Categorization (FGVC) is a challenging task due to the large intra-subcategory and small inter-subcategory variances. Recent studies tackle this task through a weakly supervised manner without using the part annotation from the experts. Of those, methods based on bilinear pooling are one of the main categories for computing the interaction between deep features and have shown high effectiveness. However, these methods mainly focus on the correlation within one specific layer but largely ignore the high interactions between multiple layers. In this study, we argue that considering the high interaction between the features from multiple layers can help to learn more distinguishing fine-grained features. To this end, we propose a High-Order-Interaction (HOI) method for FGVC. In our HOI, an efficient cross-layer trilinear pooling is introduced to calculate the third-order interaction between three different layers. Third-order interactions of different combinations are then fused to form the final representation. HOI can produce more discriminative representations and be readily integrated with the two popular techniques, attention mechanism and triplet loss, to obtain superposed improvement. Extensive experiments conducted on four FGVC datasets show the great superiority of our method over bilinear-based methods and demonstrate that the proposed method achieves the state of the art.

PDF Neurocomputing 2021 PDF

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Fine-Grained Image Classification CUB-200-2011 HOI-Net Accuracy 90.02% # 9
Fine-Grained Image Recognition CUB-200-2011 HOI-Net Accuracy 90.02% # 2
Fine-Grained Image Recognition CUB Birds HOI-Net 1:1 Accuracy 90.02% # 1

Methods


No methods listed for this paper. Add relevant methods here