Filter-based multi-task cross-corpus feature learning for speech emotion recognition

Signal, Image and Video Processing 2024 · Behzad Bakhtiari, Elham Kalhor, Seyed Hossein Ghafarian ·

Speech emotion recognition is a highly active field of research in human–machine interaction. A primary challenge faced by researchers in this area is how to tackle the problem of changing data distribution. In the last decade, studies have proposed excellent methods to address this issue, one of which is multi-task learning. Previous works employing multi-task learning for speech emotion recognition have been wrapper-based, meaning that both feature selection and classification are simultaneously solved. The current study utilizes one of the classic multi-task learning algorithms to introduce a simple yet effective multi-task learning approach for speech emotion recognition. In investigating the effectiveness of its proposed method, the present research experiments on eight well-known public speech emotion corpora and compares the results with eight of the best approaches in the literature. Encouraged by the results in both simplicity and efficiency, the current authors go on to conduct a more intensive exploration of their proposed method. This process offers a set of features for speech emotion recognition as a cross-corpus feature set. In the present study, the proposed feature set is tested with corpus and cross-corpus scenarios with seven corpora. Furthermore, the current work applies the proposed feature set on a new corpus with a language not present in earlier data. Extensive experiments show superior results.

PDF Abstract