Local semantic enhanced convnet for aerial scene recognition

Aerial scene recognition is challenging due to the complicated object distribution and spatial arrangement in a large-scale aerial image. Recent studies attempt to explore the local semantic representation capability of deep learning models, but how to exactly perceive the key local regions remains to be handled. In this paper, we present a local semantic enhanced ConvNet (LSE-Net) for aerial scene recognition, which mimics the human visual perception of key local regions in aerial scenes, in the hope of building a discriminative local semantic representation. Our LSE-Net consists of a context enhanced convolutional feature extractor, a local semantic perception module and a classification layer. Firstly, we design a multi-scale dilated convolution operators to fuse multi-level and multi-scale convolutional features in a trainable manner in order to fully receive the local feature responses in an aerial scene. Then, these features are fed into our two-branch local semantic perception module. In this module, we design a context-aware class peak response (CACPR) measurement to precisely depict the visual impulse of key local regions and the corresponding context information. Also, a spatial attention weight matrix is extracted to describe the importance of each key local region for the aerial scene. Finally, the refined class confidence maps are fed into the classification layer. Exhaustive experiments on three aerial scene classification benchmarks indicate that our LSE-Net achieves the state-of-the-art performance, which validates the effectiveness of our local semantic perception module and CACPR measurement.

PDF

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Scene Recognition AID LSENet Accuracy 96.36 # 2
Aerial Scene Classification AID (20% as trainset) LSE-Net Accuracy 94.41 # 8
Aerial Scene Classification AID (50% as trainset) LSE-Net Accuracy 96.36 # 8
Aerial Scene Classification NWPU (10% as trainset) LSE-Net Accuracy 92.23 # 8
Aerial Scene Classification NWPU (20% as trainset) LSE-Net Accuracy 93.34 # 11
Image Classification RESISC45 LSENet Top 1 Accuracy 93.49 # 6
Aerial Scene Classification UCM (50% as trainset) LSE-Net Accuracy 98.53 # 4
Aerial Scene Classification UCM (80% as trainset) LSE-Net Accuracy 99.78 # 3
Scene Classification UC Merced Land Use Dataset LSE-Net Accuracy (%) 99.78 # 3

Methods


No methods listed for this paper. Add relevant methods here