Speech Emotion Recognition

100 papers with code • 14 benchmarks • 18 datasets

Speech Emotion Recognition is a task of speech processing and computational paralinguistics that aims to recognize and categorize the emotions expressed in spoken language. The goal is to determine the emotional state of a speaker, such as happiness, anger, sadness, or frustration, from their speech patterns, such as prosody, pitch, and rhythm.

For multimodal emotion recognition, please upload your result to Multimodal Emotion Recognition on IEMOCAP

Libraries

Use these libraries to find Speech Emotion Recognition models and implementations

Latest papers with no code

Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation

no code yet • 19 Feb 2024

Foundation models have shown superior performance for speech emotion recognition (SER).

Persian Speech Emotion Recognition by Fine-Tuning Transformers

no code yet • 11 Feb 2024

Despite extensive discussions and global-scale efforts to enhance these systems, the application of this innovative and effective approach has received less attention in the context of Persian speech emotion recognition.

CochCeps-Augment: A Novel Self-Supervised Contrastive Learning Using Cochlear Cepstrum-based Masking for Speech Emotion Recognition

no code yet • 10 Feb 2024

Self-supervised learning (SSL) for automated speech recognition in terms of its emotional content, can be heavily degraded by the presence noise, affecting the efficiency of modeling the intricate temporal and spectral informative structures of speech.

Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition

no code yet • 4 Feb 2024

Through a comparative experiment and a layer-wise accuracy analysis on two distinct corpora, IEMOCAP and ESD, we explore differences between AWEs and raw self-supervised representations, as well as the proper utilization of AWEs alone and in combination with word embeddings.

STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

no code yet • 2 Feb 2024

Speech contains rich information on the emotions of humans, and Speech Emotion Recognition (SER) has been an important topic in the area of human-computer interaction.

How Paralingual are Paralinguistic Representations? A Case Study in Speech Emotion Recognition

no code yet • 2 Feb 2024

We also show that downstream models using TRILLsson representations achieve SOTA performance in terms of accuracy across various multi-lingual datasets.

MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, ASR Error Detection, and ASR Error Correction

no code yet • 24 Jan 2024

Therefore, in this paper, we incorporate two auxiliary tasks, ASR error detection (AED) and ASR error correction (AEC), to enhance the semantic coherence of ASR text, and further introduce a novel multi-modal fusion (MF) method to learn shared representations across modalities.

Speech Swin-Transformer: Exploring a Hierarchical Transformer with Shifted Windows for Speech Emotion Recognition

no code yet • 19 Jan 2024

These segment-level patches are then encoded using a stack of Swin blocks, in which a local window Transformer is utilized to explore local inter-frame emotional information across frame patches of each segment patch.

Revealing Emotional Clusters in Speaker Embeddings: A Contrastive Learning Strategy for Speech Emotion Recognition

no code yet • 19 Jan 2024

In order to leverage this information, we introduce a novel contrastive pretraining approach applied to emotion-unlabeled data for speech emotion recognition.

Improving Speaker-independent Speech Emotion Recognition Using Dynamic Joint Distribution Adaptation

no code yet • 18 Jan 2024

In speaker-independent speech emotion recognition, the training and testing samples are collected from diverse speakers, leading to a multi-domain shift challenge across the feature distributions of data from different speakers.