Speech Emotion Recognition

99 papers with code • 14 benchmarks • 18 datasets

Speech Emotion Recognition is a task of speech processing and computational paralinguistics that aims to recognize and categorize the emotions expressed in spoken language. The goal is to determine the emotional state of a speaker, such as happiness, anger, sadness, or frustration, from their speech patterns, such as prosody, pitch, and rhythm.

For multimodal emotion recognition, please upload your result to Multimodal Emotion Recognition on IEMOCAP

Libraries

Use these libraries to find Speech Emotion Recognition models and implementations

Most implemented papers

Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis

noetits/ICE-Talk 27 Mar 2019

The field of Text-to-Speech has experienced huge improvements last years benefiting from deep learning techniques.

Attention-Augmented End-to-End Multi-Task Learning for Emotion Prediction from Speech

raulsteleac/Speech_Emotion_Recognition 29 Mar 2019

Despite the increasing research interest in end-to-end learning systems for speech emotion recognition, conventional systems either suffer from the overfitting due in part to the limited training data, or do not explicitly consider the different contributions of automatically learnt representations for a specific task.

An Interaction-aware Attention Network for Speech Emotion Recognition in Spoken Dialogs

30stomercury/Interaction-aware_Attention_Network ICASSP 2019

In this work, we propose an interaction-aware attention network (IAAN) that incorporate contextual information in the learned vocal representation through a novel attention mechanism.

Learning Alignment for Multimodal Emotion Recognition from Speech

ZhiqiWang12-hash/text_audio_classification 6 Sep 2019

Further, emotion recognition will be beneficial from using audio-textual multimodal information, it is not trivial to build a system to learn from multimodality.

Speech Emotion Recognition Using Speech Feature and Word Embedding

bagustris/Apsipa2019_SpeechText APSIPA ASC 2019

Text features can be combined with speech features to improve emotion recognition accuracy, and both features can be obtained from speech.

Attentive Modality Hopping Mechanism for Speech Emotion Recognition

david-yoon/attentive-modality-hopping-for-SER 29 Nov 2019

In this work, we explore the impact of visual modality in addition to speech and text for improving the accuracy of the emotion detection system.

Non-linear Neurons with Human-like Apical Dendrite Activations

raduionescu/pynada 2 Feb 2020

In order to classify linearly non-separable data, neurons are typically organized into multi-layer neural networks that are equipped with at least one hidden layer.

Speech emotion recognition with deep convolutional neural networks

AnkushMalaker/speech-emotion-recognition Biomedical Signal Processing and Control 2020

The speech emotion recognition (or, classification) is one of the most challenging topics in data science.

On The Differences Between Song and Speech Emotion Recognition: Effect of Feature Sets, Feature Types, and Classifiers

bagustris/ravdess_song_speech 1 Apr 2020

In this paper, we evaluate the different features sets, feature types, and classifiers on both song and speech emotion recognition.