Adaptive Region Pooling for Fine-Grained Representation Learning

29 Sep 2021 · Tsai-Shien Chen, Chih-Ting Liu, Shao-Yi Chien ·

Fine-grained recognition aims to discriminate the sub-categories of the images within one general category. It is fundamentally difficult due to the requirement to extract fine-grained features from subtle regions. Nonetheless, a Convolutional Neural Network typically applies strided operations to downsample the representation, which would excessively spoil the feature resolution and lead to a significant loss of fine-grained information. In this paper, we propose Adaptive Region Pooling (ARP): a novel downsampling algorithm that makes the network only focus on a smaller but more critical region, and simultaneously increase the resolution of sub-sampled feature. ARP owns a trade-off mechanism that allows users to actively balance the scale of receptive field and the granularity of feature. Also, without any learning-based parameters, ARP provides the network a stabler training process and an earlier convergence. Extensive experiments qualitatively and quantitatively validate the effectiveness and efficiency of the proposed pooling operation and show superior performance against the state-of-the-arts in both the tasks of image classification and image retrieval.

PDF Abstract