no code implementations • 5 May 2023 • Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier
The latent space is structured to dissociate the latent dynamical factors that are shared between the modalities from those that are specific to each modality.
no code implementations • 5 May 2023 • Samir Sadok, Simon Leglaive, Renaud Séguier
While fully-supervised models have been shown to be effective for audiovisual speech emotion recognition (SER), the limited availability of labeled data remains a major challenge in the field.
no code implementations • 21 Apr 2023 • Samir Sadok, Simon Leglaive, Renaud Séguier
The VQ-MAE-S model is based on a masked autoencoder (MAE) that operates in the discrete latent space of a vector-quantized variational autoencoder.
Ranked #1 on Speech Emotion Recognition on EmoDB Dataset
1 code implementation • 14 Apr 2022 • Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier
Using only a few seconds of labeled speech signals generated with an artificial speech synthesizer, we propose a method to identify the latent subspaces encoding $f_0$ and the first three formant frequencies, we show that these subspaces are orthogonal, and based on this orthogonality, we develop a method to accurately and independently control the source-filter speech factors within the latent subspaces.