Speech Emotion Recognition

100 papers with code • 14 benchmarks • 18 datasets

Speech Emotion Recognition is a task of speech processing and computational paralinguistics that aims to recognize and categorize the emotions expressed in spoken language. The goal is to determine the emotional state of a speaker, such as happiness, anger, sadness, or frustration, from their speech patterns, such as prosody, pitch, and rhythm.

For multimodal emotion recognition, please upload your result to Multimodal Emotion Recognition on IEMOCAP

Libraries

Use these libraries to find Speech Emotion Recognition models and implementations

Leveraged Mel spectrograms using Harmonic and Percussive Components in Speech Emotion Recognition

DavidHason/ser Pacific-Asia Conference on Knowledge Discovery and Data Mining 2022

We attempt to leverage the Mel spectrogram by decomposing distinguishable acoustic features for exploitation in our proposed architecture, which includes a novel feature map generator algorithm, a CNN-based network feature extractor and a multi-layer perceptron (MLP) classifier.

4
18 Dec 2023

An Extended Variational Mode Decomposition Algorithm Developed Speech Emotion Recognition Performance

DavidHason/VGG-optiVMD Pacific-Asia Conference on Knowledge Discovery and Data Mining 2023

Emotion recognition (ER) from speech signals is a robust approach since it cannot be imitated like facial expression or text based sentiment analysis.

3
18 Dec 2023

Pre-trained Speech Processing Models Contain Human-Like Biases that Propagate to Speech Emotion Recognition

isaaconline/speat 29 Oct 2023

We compare biases found in pre-trained models to biases in downstream models adapted to the task of Speech Emotion Recognition (SER) and find that in 66 of the 96 tests performed (69%), the group that is more associated with positive valence as indicated by the SpEAT also tends to be predicted as speaking with higher valence by the downstream model.

4
29 Oct 2023

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

alibaba-damo-academy/funcodec 7 Oct 2023

In this paper, we propose LauraGPT, a unified GPT model for audio recognition, understanding, and generation.

280
07 Oct 2023

Decoding Emotions: A comprehensive Multilingual Study of Speech Models for Speech Emotion Recognition

95anantsingh/decoding-emotions 17 Aug 2023

Recent advancements in transformer-based speech representation models have greatly transformed speech processing.

3
17 Aug 2023

Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection

cecile-hi/regularized-adaptive-weight-modification 7 Aug 2023

The orthogonal weight modification to overcome catastrophic forgetting does not consider the similarity of genuine audio across different datasets.

14
07 Aug 2023

Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition

jiaxin-ye/emo-dna 4 Aug 2023

On one hand, our contrastive emotion decoupling achieves decoupling learning via a contrastive decoupling loss to strengthen the separability of emotion-relevant features from corpus-specific ones.

8
04 Aug 2023

Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition

happycolor/vesper 20 Jul 2023

Although PTMs shed new light on artificial general intelligence, they are constructed with general tasks in mind, and thus, their efficacy for specific tasks can be further improved.

21
20 Jul 2023

Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition

hltchkust/elderly_ser 26 Jun 2023

In this work, we analyze the transferability of emotion recognition across three different languages--English, Mandarin Chinese, and Cantonese; and 2 different age groups--adults and the elderly.

9
26 Jun 2023