Speech Emotion Recognition

100 papers with code • 14 benchmarks • 18 datasets

Speech Emotion Recognition is a task of speech processing and computational paralinguistics that aims to recognize and categorize the emotions expressed in spoken language. The goal is to determine the emotional state of a speaker, such as happiness, anger, sadness, or frustration, from their speech patterns, such as prosody, pitch, and rhythm.

For multimodal emotion recognition, please upload your result to Multimodal Emotion Recognition on IEMOCAP

Benchmarks

Add a Result

These leaderboards are used to track progress in Speech Emotion Recognition

Dataset	Best Model	Compare
IEMOCAP	DANN	See all
CREMA-D	ConformerXL-P	See all
RAVDESS	VQ-MAE-S-12 (Frame) + Query2Emo	See all
MSP-Podcast (Valence)	w2v2-L-robust-12	See all
MSP-Podcast (Activation)	w2v2-L-robust-12	See all
MSP-Podcast (Dominance)	w2v2-L-robust-12	See all
ShEMO	CNN (1D)	See all
EmoDB Dataset	VQ-MAE-S-12 (Frame) + Query2Emo	See all
Dusha Crowd	Dusha baseline	See all
Dusha Podcast	Dusha baseline	See all
LSSED	PyResNet	See all
EMODB	VGG-optiVMD	See all
Quechua-SER	LSTM	See all
MSP-IMPROV	emoDARTS	See all

Show all 14 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Speech Emotion Recognition models and implementations

raulsteleac/Speech_Emotion_Recognit…

3 papers

alibaba-damo-academy/FunASR

2 papers

3,383

aris-ai/Audio-and-text-based-emotio…

2 papers

138

Datasets

Subtasks

Latest papers with no code

Most implemented Social Latest No code

Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation

no code yet • 19 Feb 2024

Foundation models have shown superior performance for speech emotion recognition (SER).

Paper
Add Code

Persian Speech Emotion Recognition by Fine-Tuning Transformers

no code yet • 11 Feb 2024

Despite extensive discussions and global-scale efforts to enhance these systems, the application of this innovative and effective approach has received less attention in the context of Persian speech emotion recognition.

Paper
Add Code

CochCeps-Augment: A Novel Self-Supervised Contrastive Learning Using Cochlear Cepstrum-based Masking for Speech Emotion Recognition

no code yet • 10 Feb 2024

Self-supervised learning (SSL) for automated speech recognition in terms of its emotional content, can be heavily degraded by the presence noise, affecting the efficiency of modeling the intricate temporal and spectral informative structures of speech.

Paper
Add Code

Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition

no code yet • 4 Feb 2024

Through a comparative experiment and a layer-wise accuracy analysis on two distinct corpora, IEMOCAP and ESD, we explore differences between AWEs and raw self-supervised representations, as well as the proper utilization of AWEs alone and in combination with word embeddings.

Paper
Add Code

STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

no code yet • 2 Feb 2024

Speech contains rich information on the emotions of humans, and Speech Emotion Recognition (SER) has been an important topic in the area of human-computer interaction.

Paper
Add Code

How Paralingual are Paralinguistic Representations? A Case Study in Speech Emotion Recognition

no code yet • 2 Feb 2024

We also show that downstream models using TRILLsson representations achieve SOTA performance in terms of accuracy across various multi-lingual datasets.

Paper
Add Code

MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, ASR Error Detection, and ASR Error Correction

no code yet • 24 Jan 2024

Therefore, in this paper, we incorporate two auxiliary tasks, ASR error detection (AED) and ASR error correction (AEC), to enhance the semantic coherence of ASR text, and further introduce a novel multi-modal fusion (MF) method to learn shared representations across modalities.

Paper
Add Code

Speech Swin-Transformer: Exploring a Hierarchical Transformer with Shifted Windows for Speech Emotion Recognition

no code yet • 19 Jan 2024

These segment-level patches are then encoded using a stack of Swin blocks, in which a local window Transformer is utilized to explore local inter-frame emotional information across frame patches of each segment patch.

Paper
Add Code

Revealing Emotional Clusters in Speaker Embeddings: A Contrastive Learning Strategy for Speech Emotion Recognition

no code yet • 19 Jan 2024

In order to leverage this information, we introduce a novel contrastive pretraining approach applied to emotion-unlabeled data for speech emotion recognition.

Paper
Add Code

Improving Speaker-independent Speech Emotion Recognition Using Dynamic Joint Distribution Adaptation

no code yet • 18 Jan 2024

In speaker-independent speech emotion recognition, the training and testing samples are collected from diverse speakers, leading to a multi-domain shift challenge across the feature distributions of data from different speakers.

Paper
Add Code

Speech Emotion Recognition

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result