Search Results for author: Samir Sadok

Found 4 papers, 1 papers with code

A multimodal dynamical variational autoencoder for audiovisual speech representation learning

no code implementations5 May 2023 Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier

The latent space is structured to dissociate the latent dynamical factors that are shared between the modalities from those that are specific to each modality.

Disentanglement Image Denoising +2

A vector quantized masked autoencoder for audiovisual speech emotion recognition

no code implementations5 May 2023 Samir Sadok, Simon Leglaive, Renaud Séguier

While fully-supervised models have been shown to be effective for audiovisual speech emotion recognition (SER), the limited availability of labeled data remains a major challenge in the field.

Representation Learning Self-Supervised Learning +1

A vector quantized masked autoencoder for speech emotion recognition

no code implementations21 Apr 2023 Samir Sadok, Simon Leglaive, Renaud Séguier

The VQ-MAE-S model is based on a masked autoencoder (MAE) that operates in the discrete latent space of a vector-quantized variational autoencoder.

Self-Supervised Learning Speech Emotion Recognition

Learning and controlling the source-filter representation of speech with a variational autoencoder

1 code implementation14 Apr 2022 Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier

Using only a few seconds of labeled speech signals generated with an artificial speech synthesizer, we propose a method to identify the latent subspaces encoding $f_0$ and the first three formant frequencies, we show that these subspaces are orthogonal, and based on this orthogonality, we develop a method to accurately and independently control the source-filter speech factors within the latent subspaces.

Cannot find the paper you are looking for? You can Submit a new open access paper.