Speech Emotion Recognition

98 papers with code • 14 benchmarks • 18 datasets

Speech Emotion Recognition is a task of speech processing and computational paralinguistics that aims to recognize and categorize the emotions expressed in spoken language. The goal is to determine the emotional state of a speaker, such as happiness, anger, sadness, or frustration, from their speech patterns, such as prosody, pitch, and rhythm.

For multimodal emotion recognition, please upload your result to Multimodal Emotion Recognition on IEMOCAP

Libraries

Use these libraries to find Speech Emotion Recognition models and implementations

Latest papers with no code

MFHCA: Enhancing Speech Emotion Recognition Via Multi-Spatial Fusion and Hierarchical Cooperative Attention

no code yet • 21 Apr 2024

This paper introduces MFHCA, a novel method for Speech Emotion Recognition using Multi-Spatial Fusion and Hierarchical Cooperative Attention on spectrograms and raw audio.

TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition

no code yet • 19 Apr 2024

One persistent challenge in Speech Emotion Recognition (SER) is the ubiquitous environmental noise, which frequently results in diminished SER performance in practical use.

Accuracy enhancement method for speech emotion recognition from spectrogram using temporal frequency correlation and positional information learning through knowledge transfer

no code yet • 26 Mar 2024

In this paper, we propose a method to improve the accuracy of speech emotion recognition (SER) by using vision transformer (ViT) to attend to the correlation of frequency (y-axis) with time (x-axis) in spectrogram and transferring positional information between ViT through knowledge transfer.

emoDARTS: Joint Optimisation of CNN & Sequential Neural Network Architectures for Superior Speech Emotion Recognition

no code yet • 21 Mar 2024

This study presents emoDARTS, a DARTS-optimised joint CNN and Sequential Neural Network (SeqNN: LSTM, RNN) architecture that enhances SER performance.

The NeurIPS 2023 Machine Learning for Audio Workshop: Affective Audio Benchmarks and Novel Data

no code yet • 21 Mar 2024

In this short white paper, to encourage researchers with limited access to large-datasets, the organizers first outline several open-source datasets that are available to the community, and for the duration of the workshop are making several propriety datasets available.

Speech emotion recognition from voice messages recorded in the wild

no code yet • 4 Mar 2024

The pre-trained Unispeech-L model and its combination with eGeMAPS achieved the highest results, with 61. 64% and 55. 57% Unweighted Accuracy (UA) for 3-class valence and arousal prediction respectively, a 10% improvement over baseline models.

SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in Speech

no code yet • 1 Mar 2024

Exploring deep learning models for these predictions involves comparing single, multi-output, and sequential models highlighted in this paper.

Mixer is more than just a model

no code yet • 28 Feb 2024

In the field of computer vision, MLP-Mixer is noted for its ability to extract data information from both channel and token perspectives, effectively acting as a fusion of channel and token information.

Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation

no code yet • 19 Feb 2024

Foundation models have shown superior performance for speech emotion recognition (SER).

Persian Speech Emotion Recognition by Fine-Tuning Transformers

no code yet • 11 Feb 2024

Despite extensive discussions and global-scale efforts to enhance these systems, the application of this innovative and effective approach has received less attention in the context of Persian speech emotion recognition.