SVFAP: Self-supervised Video Facial Affect Perceiver

31 Dec 2023  ·  Licai Sun, Zheng Lian, Kexin Wang, Yu He, Mingyu Xu, Haiyang Sun, Bin Liu, JianHua Tao ·

Video-based facial affect analysis has recently attracted increasing attention owing to its critical role in human-computer interaction. Previous studies mainly focus on developing various deep learning architectures and training them in a fully supervised manner. Although significant progress has been achieved by these supervised methods, the longstanding lack of large-scale high-quality labeled data severely hinders their further improvements. Motivated by the recent success of self-supervised learning in computer vision, this paper introduces a self-supervised approach, termed Self-supervised Video Facial Affect Perceiver (SVFAP), to address the dilemma faced by supervised methods. Specifically, SVFAP leverages masked facial video autoencoding to perform self-supervised pre-training on massive unlabeled facial videos. Considering that large spatiotemporal redundancy exists in facial videos, we propose a novel temporal pyramid and spatial bottleneck Transformer as the encoder of SVFAP, which not only enjoys low computational cost but also achieves excellent performance. To verify the effectiveness of our method, we conduct experiments on nine datasets spanning three downstream tasks, including dynamic facial expression recognition, dimensional emotion recognition, and personality recognition. Comprehensive results demonstrate that SVFAP can learn powerful affect-related representations via large-scale self-supervised pre-training and it significantly outperforms previous state-of-the-art methods on all datasets. Codes will be available at https://github.com/sunlicai/SVFAP.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Dynamic Facial Expression Recognition DFEW SVFAP-S WAR 72.67 # 6
UAR 60.45 # 7
Dynamic Facial Expression Recognition DFEW SVFAP-B WAR 74.27 # 4
UAR 62.83 # 4
Dynamic Facial Expression Recognition FERV39k SVFAP-B WAR 52.29 # 2
UAR 42.14 # 3
Dynamic Facial Expression Recognition FERV39k SVFAP-S WAR 51.34 # 4
UAR 41.19 # 4
Dynamic Facial Expression Recognition MAFW SVFAP-S WAR 53.89 # 6
UAR 39.82 # 7
Dynamic Facial Expression Recognition MAFW SVFAP-B WAR 54.28 # 5
UAR 41.19 # 6

Methods