Speech Emotion Recognition with Multi-Task Learning

Speech emotion recognition (SER) classifies speech into emotion categories such as: Happy, Angry, Sad and Neutral. Recently , deep learning has been applied to the SER task. This paper proposes a multi-task learning (MTL) framework to simultaneously perform speech-to-text recognition and emotion classification, with an end-to-end deep neural model based on wav2vec-2.0. Experiments on the IEMOCAP benchmark show that the proposed method achieves the state-of-the-art performance on the SER task. In addition, an ablation study establishes the effectiveness of the proposed MTL framework.

PDF

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Speech Emotion Recognition IEMOCAP SER with MTL F1 - # 2
WA CV 0.789 # 1

Methods


No methods listed for this paper. Add relevant methods here