Speech Emotion Recognition

98 papers with code • 14 benchmarks • 18 datasets

Speech Emotion Recognition is a task of speech processing and computational paralinguistics that aims to recognize and categorize the emotions expressed in spoken language. The goal is to determine the emotional state of a speaker, such as happiness, anger, sadness, or frustration, from their speech patterns, such as prosody, pitch, and rhythm.

For multimodal emotion recognition, please upload your result to Multimodal Emotion Recognition on IEMOCAP

Libraries

Use these libraries to find Speech Emotion Recognition models and implementations

nEMO: Dataset of Emotional Speech in Polish

faceonlive/ai-research 9 Apr 2024

Speech emotion recognition has become increasingly important in recent years due to its potential applications in healthcare, customer service, and personalization of dialogue systems.

152
09 Apr 2024

Unlocking the Emotional States of High-Risk Suicide Callers through Speech Analysis

alaaNfissi/Unlocking-the-Emotional-States-of-High-Risk-Suicide-Callers-through-Speech-Analysis IEEE 18th International Conference on Semantic Computing (ICSC) 2024

In light of these challenges, we present a novel end-to-end (E2E) method for speech emotion recognition (SER) as a mean of detecting changes in emotional state, that may indicate a high risk of suicide.

1
22 Mar 2024

Iterative Feature Boosting for Explainable Speech Emotion Recognition

alaaNfissi/Iterative-Feature-Boosting-for-Explainable-Speech-Emotion-Recognition International Conference on Machine Learning and Applications (ICMLA) 2024

In speech emotion recognition (SER), using pre- defined features without considering their practical importance may lead to high dimensional datasets, including redundant and irrelevant information.

2
19 Mar 2024

Speech Emotion Recognition Via CNN-Transforemr and Multidimensional Attention Mechanism

scnu-rislab/cnn-transforemr-and-multidimensional-attention-mechanism 7 Mar 2024

In this paper, to model local and global information at different levels of granularity in speech and capture temporal, spatial and channel dependencies in speech signals, we propose a Speech Emotion Recognition network based on CNN-Transformer and multi-dimensional attention mechanisms.

4
07 Mar 2024

EMO-SUPERB: An In-depth Look at Speech Emotion Recognition

EMOsuperb/EMO-SUPERB-submission 20 Feb 2024

Speech emotion recognition (SER) is a pivotal technology for human-computer interaction systems.

19
20 Feb 2024

Filter-based multi-task cross-corpus feature learning for speech emotion recognition

BakhtiariB/MTCCFLset Signal, Image and Video Processing 2024

In investigating the effectiveness of its proposed method, the present research experiments on eight well-known public speech emotion corpora and compares the results with eight of the best approaches in the literature.

2
20 Feb 2024

Frame-level emotional state alignment method for speech emotion recognition

asolitaryman/hflea 27 Dec 2023

To address this problem, we propose a frame-level emotional state alignment method for SER.

8
27 Dec 2023

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

alibaba-damo-academy/FunASR 23 Dec 2023

To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.

3,284
23 Dec 2023

Leveraged Mel spectrograms using Harmonic and Percussive Components in Speech Emotion Recognition

DavidHason/ser Pacific-Asia Conference on Knowledge Discovery and Data Mining 2022

We attempt to leverage the Mel spectrogram by decomposing distinguishable acoustic features for exploitation in our proposed architecture, which includes a novel feature map generator algorithm, a CNN-based network feature extractor and a multi-layer perceptron (MLP) classifier.

4
18 Dec 2023

An Extended Variational Mode Decomposition Algorithm Developed Speech Emotion Recognition Performance

DavidHason/VGG-optiVMD Pacific-Asia Conference on Knowledge Discovery and Data Mining 2023

Emotion recognition (ER) from speech signals is a robust approach since it cannot be imitated like facial expression or text based sentiment analysis.

3
18 Dec 2023