Action Recognition With Motion Diversification and Dynamic Selection

Motion modeling is crucial in modern action recognition methods. As motion dynamics like moving tempos and action amplitude may vary a lot in different video clips, it poses great challenge on adaptively covering proper motion information. To address this issue, we introduce a Motion Diversification and Selection (MoDS) module to generate diversified spatio-temporal motion features and then select the suitable motion representation dynamically for categorizing the input video. To be specific, we first propose a spatio-temporal motion generation (StMG) module to construct a bank of diversified motion features with varying spatial neighborhood and time range. Then, a dynamic motion selection (DMS) module is leveraged to choose the most discriminative motion feature both spatially and temporally from the feature bank. As a result, our proposed method can make full use of the diversified spatio-temporal motion information, while maintaining computational efficiency at the inference stage. Extensive experiments on five widely-used benchmarks, demonstrate the effectiveness of the method and we achieve state-of-the-art performance on Something-Something V1 & V2 that are of large motion variation

PDF
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Action Recognition Something-Something V1 MoDS (8+16frames) Top 1 Accuracy 56.6 # 18
Action Recognition Something-Something V2 MoDS (8+16frames) Top-1 Accuracy 67.1 # 69

Methods


No methods listed for this paper. Add relevant methods here