In the face of the video data deluge, today's expensive clip-level classifiers are increasingly impractical. We propose a framework for efficient action recognition in untrimmed video that uses audio as a preview mechanism to eliminate both short-term and long-term visual redundancies... (read more)
PDFMETHOD | TYPE | |
---|---|---|
![]() |
Working Memory Models |