Search Results for author: Kevin Hsu

Found 1 papers, 0 papers with code

AVT: Audio-Video Transformer for Multimodal Action Recognition

no code implementations Submitted to ICLR 2022 Wentao Zhu, Jingru Yi, Kevin Hsu, Xiaohang Sun, Xiang Hao, Linda Liu, Mohamed Omar

AVT uses a combination of video and audio signals to improve action recognition accuracy, leveraging the effective spatio-temporal representation by the video Transformer.

Action Recognition Audio Classification +3

Cannot find the paper you are looking for? You can Submit a new open access paper.