Search Results for author: Kevin Hsu

Found 1 papers, 0 papers with code

AVT: Audio-Video Transformer for Multimodal Action Recognition

no code implementations • Submitted to ICLR 2022 • Wentao Zhu, Jingru Yi, Kevin Hsu, Xiaohang Sun, Xiang Hao, Linda Liu, Mohamed Omar

AVT uses a combination of video and audio signals to improve action recognition accuracy, leveraging the effective spatio-temporal representation by the video Transformer.

Ranked #4 on Multi-modal Classification on VGG-Sound

Action Recognition Audio Classification +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.