1 code implementation • NeurIPS 2023 • DongHo Lee, Jongseo Lee, Jinwoo Choi
In this work, we propose a novel two-stream architecture, called Cross-Attention in Space and Time (CAST), that achieves a balanced spatio-temporal understanding of videos using only RGB input.
Ranked #7 on Action Recognition on EPIC-KITCHENS-100