MiniROAD: Minimal RNN Framework for Online Action Detection

Online Action Detection (OAD) is the task of identifying actions in streaming videos without access to future frames. Much effort has been devoted to effectively capturing long-range dependencies, with transformers receiving the spotlight for their ability to capture long-range temporal structures. In contrast, RNNs have received less attention lately, due to their lower performance compared to recent methods that utilize transformers. In this paper, we investigate the underlying reasons for the inferior performance of RNNs compared to transformer-based algorithms. Our findings indicate that the discrepancy between training and inference is the primary hindrance to the effective training of RNNs. To address this, we propose applying non-uniform weights to the loss computed at each time step, which allows the RNN model to learn from the predictions made in an environment that better resembles the inference stage. Extensive experiments on three benchmark datasets, THUMOS, TVSeries, and FineAction demonstrate that a minimal RNN-based model trained with the proposed methodology performs equally or better than the existing best methods with a significant increase in efficiency. The code is available at https://github.com/jbistanbul/MiniROAD.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Online Action Detection FineAction MiniROAD mAP 37.1 # 1
Online Action Detection THUMOS'14 MiniROAD mAP 71.8 # 2
MFLOPs per pred 15.8 # 5
Online Action Detection TVSeries MiniROAD mCAP 89.6 # 1

Methods


No methods listed for this paper. Add relevant methods here