FSAF

Last updated on Feb 23, 2021

FSAF (R-101, 1x, MS train=N, ignore range=0.2-0.2)

Memory (M) 5080.0
inference time (s/im) 0.09259
File Size 211.89 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture ResNet, FPN, FSAF, Focal Loss
MS train N
lr sched 1x
Memory (M) 5080.0
Backbone Layers 101
train time (s/iter) 0.58
inference time (s/im) 0.09259
SHOW MORE
SHOW LESS
FSAF (R-50, 1x, MS train=N, ignore range=0.2-0.2)

Memory (M) 3150.0
inference time (s/im) 0.07692
File Size 139.19 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture ResNet, FPN, FSAF, Focal Loss
MS train N
lr sched 1x
Memory (M) 3150.0
Backbone Layers 50
train time (s/iter) 0.43
inference time (s/im) 0.07692
SHOW MORE
SHOW LESS
FSAF (R-50, 1x, MS train=N, ignore range=0.2-0.5)

Memory (M) 3150.0
inference time (s/im) 0.0813
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture ResNet, FPN, FSAF, Focal Loss
MS train N
lr sched 1x
Memory (M) 3150.0
Backbone Layers 50
train time (s/iter) 0.43
inference time (s/im) 0.0813
SHOW MORE
SHOW LESS
FSAF (X-101, 1x, MS train=N, ignore range=0.2-0.2)

Memory (M) 9380.0
inference time (s/im) 0.17857
File Size 360.68 MB
Training Data MS COCO
Training Resources 8x NVIDIA V100 GPUs
Training Time

Architecture ResNeXt, FSAF, FPN, Focal Loss
MS train N
lr sched 1x
Memory (M) 9380.0
Backbone Layers 101
train time (s/iter) 1.23
inference time (s/im) 0.17857
SHOW MORE
SHOW LESS
README.md

Feature Selective Anchor-Free Module for Single-Shot Object Detection

[ALGORITHM]

FSAF is an anchor-free method published in CVPR2019 (https://arxiv.org/pdf/1903.00621.pdf). Actually it is equivalent to the anchor-based method with only one anchor at each feature map position in each FPN level. And this is how we implemented it. Only the anchor-free branch is released for its better compatibility with the current framework and less computational budget.

In the original paper, feature maps within the central 0.2-0.5 area of a gt box are tagged as ignored. However, it is empirically found that a hard threshold (0.2-0.2) gives a further gain on the performance. (see the table below)

Main Results

Results on R50/R101/X101-FPN

Backbone ignore range ms-train Lr schd Train Mem (GB) Train time (s/iter) Inf time (fps) box AP Config Download
R-50 0.2-0.5 N 1x 3.15 0.43 12.3 36.0 (35.9) model | log
R-50 0.2-0.2 N 1x 3.15 0.43 13.0 37.4 config model | log
R-101 0.2-0.2 N 1x 5.08 0.58 10.8 39.3 (37.9) config model | log
X-101 0.2-0.2 N 1x 9.38 1.23 5.6 42.4 (41.0) config model | log

Notes:

  • 1x means the model is trained for 12 epochs.
  • AP values in the brackets represent those reported in the original paper.
  • All results are obtained with a single model and single-scale test.
  • X-101 backbone represents ResNext-101-64x4d.
  • All pretrained backbones use pytorch style.
  • All models are trained on 8 Titan-XP gpus and tested on a single gpu.

Citations

BibTeX reference is as follows.

@inproceedings{zhu2019feature,
  title={Feature Selective Anchor-Free Module for Single-Shot Object Detection},
  author={Zhu, Chenchen and He, Yihui and Savvides, Marios},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={840--849},
  year={2019}
}

Results

Object Detection on COCO minival

Object Detection on COCO minival
MODEL BOX AP
FSAF (X-101, 1x, MS train=N, ignore range=0.2-0.2) 42.4
FSAF (R-101, 1x, MS train=N, ignore range=0.2-0.2) 39.3
FSAF (R-50, 1x, MS train=N, ignore range=0.2-0.2) 37.4
FSAF (R-50, 1x, MS train=N, ignore range=0.2-0.5) 36.0