no code implementations • 9 Feb 2024 • Haocheng Liu, Teysir Baoueb, Mathieu Fontaine, Jonathan Le Roux, Gael Richard
Diffusion models are receiving a growing interest for a variety of signal generation tasks such as speech or music synthesis.
1 code implementation • 30 Jan 2024 • Elio Gruttadauria, Mathieu Fontaine, Slim Essid
The results show that our system improves the state-of-the-art on the AMI headset mix, using no oracle information and under full evaluation (no collar and including overlapped speech).
no code implementations • 30 Jan 2024 • Teysir Baoueb, Haocheng Liu, Mathieu Fontaine, Jonathan Le Roux, Gael Richard
Generative adversarial network (GAN) models can synthesize highquality audio signals while ensuring fast sample generation.
1 code implementation • NeurIPS 2023 • Victor Letzelter, Mathieu Fontaine, Mickaël Chen, Patrick Pérez, Slim Essid, Gaël Richard
Multiple Choice Learning is a simple framework to tackle multimodal density estimation, using the Winner-Takes-All (WTA) loss for a set of hypotheses.
no code implementations • 8 May 2023 • Diego Di Carlo, Aditya Arie Nugraha, Mathieu Fontaine, Kazuyoshi Yoshii
We address the problem of accurately interpolating measured anechoic steering vectors with a deep learning framework called the neural field.
no code implementations • 22 Jul 2022 • Aditya Arie Nugraha, Kouhei Sekiguchi, Mathieu Fontaine, Yoshiaki Bando, Kazuyoshi Yoshii
Our DNN-free system leverages the posteriors of the latest source spectrograms given by block-online FastMNMF to derive the current source covariance matrices for frame-online beamforming.
1 code implementation • 15 Jul 2022 • Kouhei Sekiguchi, Aditya Arie Nugraha, Yicheng Du, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii
This paper describes the practical response- and performance-aware development of online speech enhancement for an augmented reality (AR) headset that helps a user understand conversations made in real noisy echoic environments (e. g., cocktail party).
no code implementations • 15 Jul 2022 • Yicheng Du, Aditya Arie Nugraha, Kouhei Sekiguchi, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii
This paper describes noisy speech recognition for an augmented reality headset that helps verbal communication within real multiparty conversational environments.
Ranked #1 on Speech Enhancement on EasyCom (SDR metric)
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 11 May 2022 • Mathieu Fontaine, Kouhei Sekiguchi, Aditya Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii
This paper describes heavy-tailed extensions of a state-of-the-art versatile blind source separation method called fast multichannel nonnegative matrix factorization (FastMNMF) from a unified point of view.