Speech Emotion Recognition
100 papers with code • 14 benchmarks • 18 datasets
Speech Emotion Recognition is a task of speech processing and computational paralinguistics that aims to recognize and categorize the emotions expressed in spoken language. The goal is to determine the emotional state of a speaker, such as happiness, anger, sadness, or frustration, from their speech patterns, such as prosody, pitch, and rhythm.
For multimodal emotion recognition, please upload your result to Multimodal Emotion Recognition on IEMOCAP
Libraries
Use these libraries to find Speech Emotion Recognition models and implementationsSubtasks
Most implemented papers
SERAB: A multi-lingual benchmark for speech emotion recognition
To facilitate the process, here, we present the Speech Emotion Recognition Adaptation Benchmark (SERAB), a framework for evaluating the performance and generalization capacity of different approaches for utterance-level SER.
Speech Emotion Diarization: Which Emotion Appears When?
Speech Emotion Recognition (SER) typically relies on utterance-level solutions.
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.
nEMO: Dataset of Emotional Speech in Polish
Speech emotion recognition has become increasingly important in recent years due to its potential applications in healthcare, customer service, and personalization of dialogue systems.
Transfer Learning for Improving Speech Emotion Classification Accuracy
The majority of existing speech emotion recognition research focuses on automatic emotion detection using training and testing data from same corpus collected under the same conditions.
Attention Based Fully Convolutional Network for Speech Emotion Recognition
In this paper, we present a novel attention based fully convolutional network for speech emotion recognition.
Evaluating Gammatone Frequency Cepstral Coefficients with Neural Networks for Emotion Recognition from Speech
Mel Frequency Cepstral Coefficients (MFCCs) are one of the most commonly used representations for audio speech recognition and classification.
The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems
In this paper, we present a database of emotional speech intended to be open-sourced and used for synthesis and generation purpose.
Integrating Recurrence Dynamics for Speech Emotion Recognition
We investigate the performance of features that can capture nonlinear recurrence dynamics embedded in the speech signal for the task of Speech Emotion Recognition (SER).
Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages
Cross-lingual speech emotion recognition is an important task for practical applications.