Speech Emotion Recognition
100 papers with code • 14 benchmarks • 18 datasets
Speech Emotion Recognition is a task of speech processing and computational paralinguistics that aims to recognize and categorize the emotions expressed in spoken language. The goal is to determine the emotional state of a speaker, such as happiness, anger, sadness, or frustration, from their speech patterns, such as prosody, pitch, and rhythm.
For multimodal emotion recognition, please upload your result to Multimodal Emotion Recognition on IEMOCAP
Libraries
Use these libraries to find Speech Emotion Recognition models and implementationsSubtasks
Latest papers
Leveraged Mel spectrograms using Harmonic and Percussive Components in Speech Emotion Recognition
We attempt to leverage the Mel spectrogram by decomposing distinguishable acoustic features for exploitation in our proposed architecture, which includes a novel feature map generator algorithm, a CNN-based network feature extractor and a multi-layer perceptron (MLP) classifier.
An Extended Variational Mode Decomposition Algorithm Developed Speech Emotion Recognition Performance
Emotion recognition (ER) from speech signals is a robust approach since it cannot be imitated like facial expression or text based sentiment analysis.
Pre-trained Speech Processing Models Contain Human-Like Biases that Propagate to Speech Emotion Recognition
We compare biases found in pre-trained models to biases in downstream models adapted to the task of Speech Emotion Recognition (SER) and find that in 66 of the 96 tests performed (69%), the group that is more associated with positive valence as indicated by the SpEAT also tends to be predicted as speaking with higher valence by the downstream model.
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
In this paper, we propose LauraGPT, a unified GPT model for audio recognition, understanding, and generation.
Decoding Emotions: A comprehensive Multilingual Study of Speech Models for Speech Emotion Recognition
Recent advancements in transformer-based speech representation models have greatly transformed speech processing.
Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection
The orthogonal weight modification to overcome catastrophic forgetting does not consider the similarity of genuine audio across different datasets.
Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition
On one hand, our contrastive emotion decoupling achieves decoupling learning via a contrastive decoupling loss to strengthen the separability of emotion-relevant features from corpus-specific ones.
A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality Conversion
Speech Emotion Recognition (SER) is a challenging task.
Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Although PTMs shed new light on artificial general intelligence, they are constructed with general tasks in mind, and thus, their efficacy for specific tasks can be further improved.
Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition
In this work, we analyze the transferability of emotion recognition across three different languages--English, Mandarin Chinese, and Cantonese; and 2 different age groups--adults and the elderly.