SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

Convolutional neural networks typically encode an input image into a series of intermediate features with decreasing resolutions. While this structure is suited to classification tasks, it does not perform well for tasks requiring simultaneous recognition and localization (e.g., object detection)... (read more)

PDF Abstract CVPR 2020 PDF CVPR 2020 Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK USES EXTRA
TRAINING DATA
RESULT BENCHMARK
Real-Time Object Detection COCO SpineNet-49 MAP 45.3 # 5
Real-Time Object Detection COCO SpineNet-49 (RetinaNet, single-scale, with TRT) inference time (ms) 34.3 # 9
Instance Segmentation COCO minival RetinaNet (SpineNet-190, 1536x1536) mask AP 46.1 # 3
Object Detection COCO minival RetinaNet (SpineNet-190, 1536x1536) box AP 52.2 # 5
Object Detection COCO test-dev RetinaNet (SpineNet-96, 1024x1024) box AP 48.6 # 26
AP50 68.4 # 24
AP75 52.5 # 34
APS 32 # 25
APM 52.3 # 29
APL 62 # 23
Object Detection COCO test-dev RetinaNet (SpineNet-190, 1280x1280) box AP 52.1 # 14
AP50 71.8 # 11
AP75 56.5 # 18
APS 35.4 # 12
APM 55 # 16
APL 63.6 # 16
Object Detection COCO test-dev RetinaNet (SpineNet-49, 640x640) box AP 44.3 # 51
AP50 63.8 # 51
AP75 47.6 # 58
APS 25.9 # 54
APM 47.7 # 52
APL 61.1 # 27
Object Detection COCO test-dev RetinaNet (SpineNet-143, 1280x1280) box AP 50.7 # 18
AP50 70.4 # 14
AP75 54.9 # 23
APS 33.6 # 19
APM 53.9 # 21
APL 62.1 # 22
Object Detection COCO test-dev RetinaNet (SpineNet-49S, 640x640) box AP 41.5 # 68
AP50 60.5 # 69
AP75 44.6 # 75
APS 23.3 # 70
APM 45 # 68
APL 58 # 40
Object Detection COCO test-dev RetinaNet (SpineNet-49, 896x896) box AP 46.7 # 35
AP50 66.3 # 34
AP75 50.6 # 43
APS 29.1 # 38
APM 50.1 # 38
APL 61.7 # 24
Instance Segmentation COCO test-dev Mask R-CNN (SpineNet-190, 1536x1536) mask AP 46.1 # 4
Image Classification ImageNet SpineNet-143 Top 1 Accuracy 79% # 101
Top 5 Accuracy 94.4% # 66
Number of params 60.5M # 39
Image Classification iNaturalist SpineNet-143 Top 1 Accuracy 63.6% # 3
Top 5 Accuracy 84.8% # 2

Methods used in the Paper


METHOD TYPE
Cosine Annealing
Learning Rate Schedules
Entropy Regularization
Regularization
PPO
Policy Gradient Methods
Neural Architecture Search
Neural Architecture Search
NAS-FPN
Feature Extractors
Tanh Activation
Activation Functions
Residual Connection
Skip Connections
Average Pooling
Pooling Operations
Sigmoid Activation
Activation Functions
LSTM
Recurrent Neural Networks
Bottleneck Residual Block
Skip Connection Blocks
Residual Block
Skip Connection Blocks
Kaiming Initialization
Initialization
1x1 Convolution
Convolutions
Max Pooling
Pooling Operations
Random Horizontal Flip
Image Data Augmentation
Random Resized Crop
Image Data Augmentation
ResNet
Convolutional Neural Networks
Focal Loss
Loss Functions
FPN
Feature Extractors
RetinaNet
Object Detection Models
Convolution
Convolutions
RoIAlign
RoI Feature Extractors
Mask R-CNN
Instance Segmentation Models
Dense Connections
Feedforward Networks
Global Average Pooling
Pooling Operations
Softmax
Output Functions
Stochastic Depth
Regularization
Weight Decay
Regularization
SGD with Momentum
Stochastic Optimization
Batch Normalization
Normalization
Linear Warmup With Cosine Annealing
Learning Rate Schedules
Swish
Activation Functions
ReLU
Activation Functions
SpineNet
Convolutional Neural Networks