SNIPER: Efficient Multi-Scale Training

We present SNIPER, an algorithm for performing efficient multi-scale training in instance level visual recognition tasks. Instead of processing every pixel in an image pyramid, SNIPER processes context regions around ground-truth instances (referred to as chips) at the appropriate scale. For background sampling, these context-regions are generated using proposals extracted from a region proposal network trained with a short learning schedule. Hence, the number of chips generated per image during training adaptively changes based on the scene complexity. SNIPER only processes 30% more pixels compared to the commonly used single scale training at 800x1333 pixels on the COCO dataset. But, it also observes samples from extreme resolutions of the image pyramid, like 1400x2000 pixels. As SNIPER operates on resampled low resolution chips (512x512 pixels), it can have a batch size as large as 20 on a single GPU even with a ResNet-101 backbone. Therefore it can benefit from batch-normalization during training without the need for synchronizing batch-normalization statistics across GPUs. SNIPER brings training of instance level recognition tasks like object detection closer to the protocol for image classification and suggests that the commonly accepted guideline that it is important to train on high resolution images for instance level visual recognition tasks might not be correct. Our implementation based on Faster-RCNN with a ResNet-101 backbone obtains an mAP of 47.6% on the COCO dataset for bounding box detection and can process 5 images per second during inference with a single GPU. Code is available at https://github.com/MahyarNajibi/SNIPER/.

PDF Abstract NeurIPS 2018 PDF NeurIPS 2018 Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Object Detection COCO test-dev SNIPER (ResNet-101) box mAP 46.1 # 124
AP50 67.0 # 56
AP75 51.6 # 53
APS 29.6 # 49
APM 48.9 # 61
APL 58.1 # 61
Hardware Burden 29G # 1
Operations per network pass None # 1
Object Detection COCO test-dev SNIPER (ResNet-50) box mAP 43.5 # 152
AP50 65.0 # 77
AP75 48.6 # 83
APS 26.1 # 86
APM 46.3 # 98
APL 56.0 # 91
Hardware Burden 29G # 1
Operations per network pass None # 1

Methods