SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition

While many action recognition datasets consist of collections of brief, trimmed videos each containing a relevant action, videos in the real-world (e.g., on YouTube) exhibit very different properties: they are often several minutes long, where brief relevant clips are often interleaved with segments of extended duration containing little change. Applying densely an action recognition system to every temporal clip within such videos is prohibitively expensive... (read more)

PDF Abstract ICCV 2019 PDF ICCV 2019 Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Action Recognition miniSports IF+MD+RGB-R (ShuffleNet-26 ) Accuracy 69.9 # 2
Action Recognition miniSports IF+MD+RGB-R (ResNet-18) Accuracy 74.9 # 1

Methods used in the Paper