Mask R-CNN

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without bells and whistles, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code has been made available at: https://github.com/facebookresearch/Detectron

PDF Abstract ICCV 2017 PDF ICCV 2017 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Instance Segmentation BDD100K val Mask R-CNN AP 20.5 # 3
Nuclear Segmentation Cell17 Mask R-CNN F1-score 0.8004 # 2
Dice 0.707 # 2
Hausdorff 12.6723 # 2
Panoptic Segmentation Cityscapes val Mask R-CNN+COCO PQth 54.0 # 19
Object Detection COCO minival Mask R-CNN (ResNet-50-FPN) box AP 37.7 # 188
Object Detection COCO minival Mask R-CNN (ResNet-101-FPN) box AP 40.0 # 170
Object Detection COCO minival Mask R-CNN (ResNeXt-101-FPN) box AP 36.7 # 190
AP50 59.5 # 80
AP75 38.9 # 91
Object Detection COCO-O Mask R-CNN (ResNet-50) Average mAP 17.1 # 35
Effective Robustness -0.11 # 37
Keypoint Detection COCO test-challenge Mask R-CNN* AR 75.4 # 5
ARM 70.2 # 5
AP 68.9 # 6
AP50 89.2 # 5
AP75 75.2 # 5
APL 82.6 # 4
AR50 93.2 # 5
AR75 81.2 # 5
ARL 76.8 # 5
Object Detection COCO test-dev Mask R-CNN (ResNeXt-101-FPN) box mAP 39.8 # 182
AP50 62.3 # 102
AP75 43.4 # 128
APS 22.1 # 116
APM 43.2 # 116
APL 51.2 # 122
Hardware Burden 9G # 1
Object Detection COCO test-dev Mask R-CNN (ResNet-101-FPN) box mAP 38.2 # 195
AP50 60.3 # 121
AP75 41.7 # 136
APS 20.1 # 130
APM 41.1 # 128
APL 50.2 # 131
Hardware Burden 9G # 1
Instance Segmentation COCO test-dev Mask R-CNN (ResNeXt-101-FPN) mask AP 37.1 # 91
AP50 60.0 # 30
AP75 39.4 # 29
APS 16.9 # 34
APM 39.9 # 31
APL 53.5 # 22
Keypoint Detection COCO test-dev Mask R-CNN APL 71.4 # 13
APM 57.8 # 14
AP50 87.3 # 9
AP75 68.7 # 11
Pose Estimation COCO test-dev Mask-RCNN AP 63.1 # 42
AP50 87.3 # 35
AP75 68.7 # 39
APL 71.4 # 33
Multi-Person Pose Estimation CrowdPose Mask R-CNN mAP @0.5:0.95 57.2 # 21
AP Easy 69.4 # 17
AP Medium 57.9 # 19
AP Hard 45.8 # 17
Keypoint Estimation GRIT Mask R-CNN Keypoint (ablation) 70.8 # 1
Keypoint (test) 70.6 # 1
Object Localization GRIT Mask R-CNN Localization (ablation) 44.7 # 3
Localization (test) 45.1 # 3
Object Segmentation GRIT Mask R-CNN Segmentation (ablation) 26.2 # 2
Segmentation (test) 26.2 # 2
Object Detection iSAID Mask-RCNN+ Average Precision 37.18 # 4
Object Detection iSAID Mask-RCNN Average Precision 36.50 # 5
Instance Segmentation iSAID Mask-RCNN Average Precision 36.50 # 4
Instance Segmentation iSAID Mask-RCNN+ Average Precision 37.18 # 3
Multi-tissue Nucleus Segmentation Kumar Mask R-CNN (e) Dice 0.760 # 17
Hausdorff Distance (mm) 50.9 # 12
Multi-Human Parsing MHP v1.0 Mask R-CNN AP 0.5 52.68% # 2
Multi-Human Parsing MHP v2.0 Mask R-CNN AP 0.5 14.9% # 5
Real-Time Object Detection MS COCO Mask R-CNN X-152-32x8d box AP 45.2 # 40
Keypoint Detection MS COCO Mask R-CNN Validation AP 69.2 # 12
Test AP 63.1 # 15
Multi-Person Pose Estimation OCHuman Mask R-CNN Validation AP 20.2 # 7
AP50 33.2 # 8
AP75 24.5 # 8

Methods