1 code implementation • CVPR 2023 • Junhua Liao, Haihan Duan, Kanghui Feng, Wanbing Zhao, Yanbing Yang, Liangyin Chen
Experimental results on the AVA-ActiveSpeaker dataset show that our framework achieves competitive mAP performance (94. 1% vs. 94. 2%), while the resource costs are significantly lower than the state-of-the-art method, especially in model parameters (1. 0M vs. 22. 5M, about 23x) and FLOPs (0. 6G vs. 2. 6G, about 4x).
Audio-Visual Active Speaker Detection speaker-diarization +2
no code implementations • 2 Nov 2022 • Tongtong Song, Qiang Xu, Haoyu Lu, Longbiao Wang, Hao Shi, Yuqin Lin, Yanbing Yang, Jianwu Dang
It has two stages: the speech awareness (SA) stage and the language fusion (LF) stage.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1