no code implementations • 18 Nov 2021 • Raivo Koot, Markus Hennerbichler, Haiping Lu
Our experiments conclude that current video transformers are not yet capable of lightweight action recognition on par with traditional convolutional baselines, and that the previously mentioned shortcomings need to be addressed to bridge this gap.