In Search of a Robust Facial Expressions Recognition Model: A Large-Scale Visual Cross-Corpus Study

Many researchers have been seeking robust emotion recognition system for already last two decades. It would advance computer systems to a new level of interaction, providing much more natural feedback during human–computer interaction due to analysis of user affect state. However, one of the key problems in this domain is a lack of generalization ability: we observe dramatic degradation of model performance when it was trained on one corpus and evaluated on another one. Although some studies were done in this direction, visual modality still remains under-investigated. Therefore, we introduce the visual cross-corpus study conducted with the utilization of eight corpora, which differ in recording conditions, participants’ appearance characteristics, and complexity of data processing. We propose a visual-based end-to-end emotion recognition framework, which consists of the robust pre-trained backbone model and temporal sub-system in order to model temporal dependencies across many video frames. In addition, a detailed analysis of mistakes and advantages of the backbone model is provided, demonstrating its high ability of generalization. Our results show that the backbone model has achieved the accuracy of 66.4% on the AffectNet dataset, outperforming all the state-of-the-art results. Moreover, the CNN-LSTM model has demonstrated a decent efficacy on dynamic visual datasets during cross-corpus experiments, achieving comparable with state-of-the-art results. In addition, we provide backbone and CNN-LSTM models for future researchers: they can be accessed via GitHub.

PDF
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Facial Expression Recognition (FER) AffectNet EmoAffectNet Accuracy (7 emotion) 66.49 # 8
Facial Expression Recognition (FER) Aff-Wild2 EmoAffectNet LSTM UAR 52.9 # 1
Facial Expression Recognition (FER) CREMA-D EmoAffectNet LSTM UAR 79.0 # 1
Facial Expression Recognition (FER) RAVDESS EmoAffectNet LSTM UAR 69.7 # 1
Facial Expression Recognition (FER) SAVEE EmoAffectNet LSTM UAR 82.8 # 1

Methods