Finding Action Tubes with a Sparse-to-Dense Framework

30 Aug 2020  ·  Yuxi Li, Weiyao Lin, Tao Wang, John See, Rui Qian, Ning Xu, Li-Min Wang, Shugong Xu ·

The task of spatial-temporal action detection has attracted increasing attention among researchers. Existing dominant methods solve this problem by relying on short-term information and dense serial-wise detection on each individual frames or clips. Despite their effectiveness, these methods showed inadequate use of long-term information and are prone to inefficiency. In this paper, we propose for the first time, an efficient framework that generates action tube proposals from video streams with a single forward pass in a sparse-to-dense manner. There are two key characteristics in this framework: (1) Both long-term and short-term sampled information are explicitly utilized in our spatiotemporal network, (2) A new dynamic feature sampling module (DTS) is designed to effectively approximate the tube output while keeping the system tractable. We evaluate the efficacy of our model on the UCF101-24, JHMDB-21 and UCFSports benchmark datasets, achieving promising results that are competitive to state-of-the-art methods. The proposed sparse-to-dense strategy rendered our framework about 7.6 times more efficient than the nearest competitor.

PDF Abstract
No code implementations yet. Submit your code now

Results from the Paper


Ranked #3 on Action Detection on UCF Sports (Video-mAP 0.2 metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Action Detection J-HMDB DTS Video-mAP 0.2 76.1 # 8
Video-mAP 0.5 74.3 # 10
Action Detection UCF101-24 DTS Video-mAP 0.5 54 # 4
Action Detection UCF Sports DTS Video-mAP 0.2 94.3 # 3
Video-mAP 0.5 93.8 # 4

Methods


No methods listed for this paper. Add relevant methods here