SlowFast Networks for Video Recognition

We present SlowFast networks for video recognition. Our model involves (i) a Slow pathway, operating at low frame rate, to capture spatial semantics, and (ii) a Fast pathway, operating at high frame rate, to capture motion at fine temporal resolution. The Fast pathway can be made very lightweight by reducing its channel capacity, yet can learn useful temporal information for video recognition. Our models achieve strong performance for both action classification and detection in video, and large improvements are pin-pointed as contributions by our SlowFast concept. We report state-of-the-art accuracy on major video recognition benchmarks, Kinetics, Charades and AVA. Code has been made available at: https://github.com/facebookresearch/SlowFast

PDF Abstract ICCV 2019 PDF ICCV 2019 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Action Recognition AVA v2.1 SlowFast (Kinetics-400 pretraining) mAP (Val) 26.3 # 9
Action Recognition AVA v2.1 SlowFast++ (Kinetics-600 pretraining, NL) mAP (Val) 28.3 # 4
Action Recognition AVA v2.1 SlowFast (Kinetics-600 pretraining, NL) mAP (Val) 27.3 # 7
Action Recognition AVA v2.1 SlowFast (Kinetics-600 pretraining) mAP (Val) 26.8 # 8
Action Recognition AVA v2.2 SlowFast, 4x16, R50 (Kinetics-400 pretraining) mAP 21.9 # 38
Action Recognition AVA v2.2 SlowFast, 8x8 R101+NL (Kinetics-600 pretraining) mAP 27.1 # 31
Action Classification Charades SlowFast (Kinetics-400 pretraining, NL) MAP 42.5 # 25
Action Classification Charades SlowFast (Kinetics-600 pretraining, NL) MAP 45.2 # 18
Action Classification Charades SlowFast (Kinetics-600 pretraining) MAP 42.1 # 27
Action Classification Kinetics-400 SlowFast 16x8 (ResNet-101 + NL) Acc@1 79.8 # 97
Action Classification Kinetics-400 SlowFast 16x8 (ResNet-101) Acc@1 78.9 # 113
Acc@5 93.5 # 91
Action Classification Kinetics-400 SlowFast 8x8 (ResNet-101) Acc@1 77.9 # 123
Acc@5 93.2 # 97
Action Classification Kinetics-400 SlowFast 8x8 (ResNet-50) Acc@1 77 # 137
Acc@5 92.6 # 103
Action Classification Kinetics-400 SlowFast 4x16 (ResNet-50) Acc@1 75.6 # 147
Acc@5 92.1 # 107
Action Classification Kinetics-400 SlowFast 16x8 (ResNet-101 + NL) Acc@5 93.9 # 81
Action Classification Kinetics-600 SlowFast 16x8 (ResNet-101 + NL) Top-1 Accuracy 81.8 # 46
Top-5 Accuracy 95.1 # 39
Action Classification Kinetics-600 SlowFast 16x8 (ResNet-101) Top-1 Accuracy 81.1 # 49
Top-5 Accuracy 95.1 # 39
Action Classification Kinetics-600 SlowFast 8x8 (ResNet-101) Top-1 Accuracy 80.4 # 51
Top-5 Accuracy 94.8 # 42
Action Classification Kinetics-600 SlowFast 8x8 (ResNet-50) Top-1 Accuracy 79.9 # 52
Top-5 Accuracy 94.5 # 43
Action Classification Kinetics-600 SlowFast 4x16 (ResNet-50) Top-1 Accuracy 78.8 # 54
Top-5 Accuracy 94 # 44

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Uses Extra
Training Data
Source Paper Compare
Action Recognition AVA v2.2 SlowFast, 8x8, R101 (Kinetics-400 pretraining) mAP 23.8 # 37
Action Recognition AVA v2.2 SlowFast, 16x8 R101+NL (Kinetics-600 pretraining) mAP 27.5 # 28
Action Recognition Diving-48 SlowFast Accuracy 77.6 # 11
Action Recognition Something-Something V2 SlowFast Top-1 Accuracy 61.7 # 105

Methods


No methods listed for this paper. Add relevant methods here