Speech Emotion Recognition

99 papers with code • 14 benchmarks • 18 datasets

Speech Emotion Recognition is a task of speech processing and computational paralinguistics that aims to recognize and categorize the emotions expressed in spoken language. The goal is to determine the emotional state of a speaker, such as happiness, anger, sadness, or frustration, from their speech patterns, such as prosody, pitch, and rhythm.

For multimodal emotion recognition, please upload your result to Multimodal Emotion Recognition on IEMOCAP

Benchmarks

Add a Result

These leaderboards are used to track progress in Speech Emotion Recognition

Dataset	Best Model	Compare
IEMOCAP	DANN	See all
CREMA-D	ConformerXL-P	See all
RAVDESS	VQ-MAE-S-12 (Frame) + Query2Emo	See all
MSP-Podcast (Valence)	w2v2-L-robust-12	See all
MSP-Podcast (Activation)	w2v2-L-robust-12	See all
MSP-Podcast (Dominance)	w2v2-L-robust-12	See all
ShEMO	CNN (1D)	See all
EmoDB Dataset	VQ-MAE-S-12 (Frame) + Query2Emo	See all
Dusha Crowd	Dusha baseline	See all
Dusha Podcast	Dusha baseline	See all
LSSED	PyResNet	See all
EMODB	VGG-optiVMD	See all
Quechua-SER	LSTM	See all
MSP-IMPROV	emoDARTS	See all

Show all 14 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Speech Emotion Recognition models and implementations

raulsteleac/Speech_Emotion_Recognit…

3 papers

alibaba-damo-academy/FunASR

2 papers

3,378

aris-ai/Audio-and-text-based-emotio…

2 papers

138

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis

noetits/ICE-Talk • • 27 Mar 2019

The field of Text-to-Speech has experienced huge improvements last years benefiting from deep learning techniques.

Paper
Code

Attention-Augmented End-to-End Multi-Task Learning for Emotion Prediction from Speech

raulsteleac/Speech_Emotion_Recognition • • 29 Mar 2019

Despite the increasing research interest in end-to-end learning systems for speech emotion recognition, conventional systems either suffer from the overfitting due in part to the limited training data, or do not explicitly consider the different contributions of automatically learnt representations for a specific task.

Paper
Code

An Interaction-aware Attention Network for Speech Emotion Recognition in Spoken Dialogs

30stomercury/Interaction-aware_Attention_Network • • ICASSP 2019

In this work, we propose an interaction-aware attention network (IAAN) that incorporate contextual information in the learned vocal representation through a novel attention mechanism.

Paper
Code

Learning Alignment for Multimodal Emotion Recognition from Speech

ZhiqiWang12-hash/text_audio_classification • • 6 Sep 2019

Further, emotion recognition will be beneficial from using audio-textual multimodal information, it is not trivial to build a system to learn from multimodality.

Paper
Code

Speech Emotion Recognition Using Speech Feature and Word Embedding

bagustris/Apsipa2019_SpeechText • APSIPA ASC 2019

Text features can be combined with speech features to improve emotion recognition accuracy, and both features can be obtained from speech.

Paper
Code

Attentive Modality Hopping Mechanism for Speech Emotion Recognition

david-yoon/attentive-modality-hopping-for-SER • • 29 Nov 2019

In this work, we explore the impact of visual modality in addition to speech and text for improving the accuracy of the emotion detection system.

Paper
Code

Non-linear Neurons with Human-like Apical Dendrite Activations

raduionescu/pynada • • 2 Feb 2020

In order to classify linearly non-separable data, neurons are typically organized into multi-layer neural networks that are equipped with at least one hidden layer.

Paper
Code