Speech Emotion Recognition
98 papers with code • 14 benchmarks • 18 datasets
Speech Emotion Recognition is a task of speech processing and computational paralinguistics that aims to recognize and categorize the emotions expressed in spoken language. The goal is to determine the emotional state of a speaker, such as happiness, anger, sadness, or frustration, from their speech patterns, such as prosody, pitch, and rhythm.
For multimodal emotion recognition, please upload your result to Multimodal Emotion Recognition on IEMOCAP
Libraries
Use these libraries to find Speech Emotion Recognition models and implementationsSubtasks
Latest papers
nEMO: Dataset of Emotional Speech in Polish
Speech emotion recognition has become increasingly important in recent years due to its potential applications in healthcare, customer service, and personalization of dialogue systems.
Unlocking the Emotional States of High-Risk Suicide Callers through Speech Analysis
In light of these challenges, we present a novel end-to-end (E2E) method for speech emotion recognition (SER) as a mean of detecting changes in emotional state, that may indicate a high risk of suicide.
Iterative Feature Boosting for Explainable Speech Emotion Recognition
In speech emotion recognition (SER), using pre- defined features without considering their practical importance may lead to high dimensional datasets, including redundant and irrelevant information.
Speech Emotion Recognition Via CNN-Transforemr and Multidimensional Attention Mechanism
In this paper, to model local and global information at different levels of granularity in speech and capture temporal, spatial and channel dependencies in speech signals, we propose a Speech Emotion Recognition network based on CNN-Transformer and multi-dimensional attention mechanisms.
EMO-SUPERB: An In-depth Look at Speech Emotion Recognition
Speech emotion recognition (SER) is a pivotal technology for human-computer interaction systems.
Filter-based multi-task cross-corpus feature learning for speech emotion recognition
In investigating the effectiveness of its proposed method, the present research experiments on eight well-known public speech emotion corpora and compares the results with eight of the best approaches in the literature.
Frame-level emotional state alignment method for speech emotion recognition
To address this problem, we propose a frame-level emotional state alignment method for SER.
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.
Leveraged Mel spectrograms using Harmonic and Percussive Components in Speech Emotion Recognition
We attempt to leverage the Mel spectrogram by decomposing distinguishable acoustic features for exploitation in our proposed architecture, which includes a novel feature map generator algorithm, a CNN-based network feature extractor and a multi-layer perceptron (MLP) classifier.
An Extended Variational Mode Decomposition Algorithm Developed Speech Emotion Recognition Performance
Emotion recognition (ER) from speech signals is a robust approach since it cannot be imitated like facial expression or text based sentiment analysis.