Search Results for author: Emmanuel Vincent

Found 32 papers, 12 papers with code

Transformer versus LSTM Language Models trained on Uncertain ASR Hypotheses in Limited Data Scenarios

no code implementations LREC 2022 Imran Sheikh, Emmanuel Vincent, Irina Illina

Training of LSTM LMs in such limited data scenarios can benefit from alternate uncertain ASR hypotheses, as observed in our recent work.

Adapting Language Models When Training on Privacy-Transformed Data

no code implementations LREC 2022 Tugtekin Turan, Dietrich Klakow, Emmanuel Vincent, Denis Jouvet

In recent years, voice-controlled personal assistants have revolutionized the interaction with smart devices and mobile applications.

The VoicePrivacy 2024 Challenge Evaluation Plan

1 code implementation3 Apr 2024 Natalia Tomashenko, Xiaoxiao Miao, Pierre Champion, Sarina Meyer, Xin Wang, Emmanuel Vincent, Michele Panariello, Nicholas Evans, Junichi Yamagishi, Massimiliano Todisco

The task of the challenge is to develop a voice anonymization system for speech data which conceals the speaker's voice identity while protecting linguistic content and emotional states.

Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications

no code implementations11 Mar 2024 Can Cui, Imran Ahamad Sheikh, Mostafa Sadeghi, Emmanuel Vincent

Past studies on end-to-end meeting transcription have focused on model architecture and have mostly been evaluated on simulated meeting data.

Action Detection Activity Detection +2

End-to-end Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis

no code implementations16 Oct 2023 Can Cui, Imran Ahamad Sheikh, Mostafa Sadeghi, Emmanuel Vincent

We present an end-to-end multichannel speaker-attributed automatic speech recognition (MC-SA-ASR) system that combines a Conformer-based encoder with multi-frame crosschannel attention and a speaker-attributed Transformer-based decoder.

Automatic Speech Recognition Speaker Identification +2

Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS

1 code implementation28 May 2023 Sewade Ogun, Vincent Colotte, Emmanuel Vincent

Flow-based generative models are widely used in text-to-speech (TTS) systems to learn the distribution of audio features (e. g., Mel-spectrograms) given the input tokens and to sample from this distribution to generate diverse utterances.

Zero-Shot Multi-Speaker TTS

Can we use Common Voice to train a Multi-Speaker TTS system?

1 code implementation12 Oct 2022 Sewade Ogun, Vincent Colotte, Emmanuel Vincent

We show the viability of this approach for training a multi-speaker GlowTTS model on the Common Voice English dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

The VoicePrivacy 2020 Challenge Evaluation Plan

1 code implementation14 May 2022 Natalia Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco

The VoicePrivacy Challenge aims to promote the development of privacy preservation tools for speech technology by gathering a new community to define the tasks of interest and the evaluation methodology, and benchmarking solutions through a series of challenges.

Benchmarking

The VoicePrivacy 2022 Challenge Evaluation Plan

1 code implementation23 Mar 2022 Natalia Tomashenko, Xin Wang, Xiaoxiao Miao, Hubert Nourtel, Pierre Champion, Massimiliano Todisco, Emmanuel Vincent, Nicholas Evans, Junichi Yamagishi, Jean-François Bonastre

Participants apply their developed anonymization systems, run evaluation scripts and submit objective evaluation results and anonymized speech data to the organizers.

Speaker Verification

Differentially Private Speaker Anonymization

no code implementations23 Feb 2022 Ali Shahin Shamsabadi, Brij Mohan Lal Srivastava, Aurélien Bellet, Nathalie Vauquier, Emmanuel Vincent, Mohamed Maouche, Marc Tommasi, Nicolas Papernot

We remove speaker information from these attributes by introducing differentially private feature extractors based on an autoencoder and an automatic speech recognizer, respectively, trained using noise layers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Blind Room Parameter Estimation Using Multiple-Multichannel Speech Recordings

1 code implementation29 Jul 2021 Prerak Srivastava, Antoine Deleforge, Emmanuel Vincent

Knowing the geometrical and acoustical parameters of a room may benefit applications such as audio augmented reality, speech dereverberation or audio forensics.

Speech Dereverberation

UIAI System for Short-Duration Speaker Verification Challenge 2020

no code implementations26 Jul 2020 Md Sahidullah, Achintya Kumar Sarkar, Ville Vestman, Xuechen Liu, Romain Serizel, Tomi Kinnunen, Zheng-Hua Tan, Emmanuel Vincent

Our primary submission to the challenge is the fusion of seven subsystems which yields a normalized minimum detection cost function (minDCF) of 0. 072 and an equal error rate (EER) of 2. 14% on the evaluation set.

Text-Dependent Speaker Verification

LibriMix: An Open-Source Dataset for Generalizable Speech Separation

5 code implementations22 May 2020 Joris Cosentino, Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent

Most deep learning-based speech separation models today are benchmarked on it.

Audio and Speech Processing

Foreground-Background Ambient Sound Scene Separation

no code implementations11 May 2020 Michel Olvera, Emmanuel Vincent, Romain Serizel, Gilles Gasso

Ambient sound scenes typically comprise multiple short events occurring on top of a somewhat stationary background.

Introducing the VoicePrivacy Initiative

3 code implementations4 May 2020 Natalia Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco

The VoicePrivacy initiative aims to promote the development of privacy preservation tools for speech technology by gathering a new community to define the tasks of interest and the evaluation methodology, and benchmarking solutions through a series of challenges.

Benchmarking

Limitations of weak labels for embedding and tagging

1 code implementation5 Feb 2020 Nicolas Turpault, Romain Serizel, Emmanuel Vincent

Many datasets and approaches in ambient sound analysis use weakly labeled data. Weak labels are employed because annotating every data sample with a strong label is too expensive. Yet, their impact on the performance in comparison to strong labels remains unclear. Indeed, weak labels must often be dealt with at the same time as other challenges, namely multiple labels per sample, unbalanced classes and/or overlapping events. In this paper, we formulate a supervised learning problem which involves weak labels. We create a dataset that focuses on the difference between strong and weak labels as opposed to other challenges.

Joint NN-Supported Multichannel Reduction of Acoustic Echo, Reverberation and Noise

no code implementations20 Nov 2019 Guillaume Carbajal, Romain Serizel, Emmanuel Vincent, Eric Humbert

We consider the problem of simultaneous reduction of acoustic echo, reverberation and noise.

Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion?

no code implementations12 Nov 2019 Brij Mohan Lal Srivastava, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent

In this paper, we focus on the protection of speaker identity and study the extent to which users can be recognized based on the encoded representation of their speech as obtained by a deep encoder-decoder architecture trained for ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Filterbank design for end-to-end speech separation

2 code implementations23 Oct 2019 Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent

Also, we validate the use of parameterized filterbanks and show that complex-valued representations and masks are beneficial in all conditions.

Speaker Recognition Speech Separation

AI in the media and creative industries

no code implementations10 May 2019 Giuseppe Amato, Malte Behrmann, Frédéric Bimbot, Baptiste Caramiaux, Fabrizio Falchi, Ander Garcia, Joost Geurts, Jaume Gibert, Guillaume Gravier, Hadmut Holken, Hartmut Koenitz, Sylvain Lefebvre, Antoine Liutkus, Fabien Lotte, Andrew Perkis, Rafael Redondo, Enrico Turrin, Thierry Vieville, Emmanuel Vincent

Thanks to the Big Data revolution and increasing computing capacities, Artificial Intelligence (AI) has made an impressive revival over the past few years and is now omnipresent in both research and industry.

A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders

no code implementations3 May 2019 Manuel Pariente, Antoine Deleforge, Emmanuel Vincent

Recent studies have explored the use of deep generative models of speech spectra based of variational autoencoders (VAEs), combined with unsupervised noise models, to perform speech enhancement.

Speech Enhancement Variational Inference

An improved uncertainty propagation method for robust i-vector based speaker recognition

no code implementations15 Feb 2019 Dayana Ribas, Emmanuel Vincent

So far, different uncertainty propagation methods have been proposed to compensate noise and reverberation in i-vectors in the context of speaker recognition.

Speaker Recognition Speaker Verification +1

The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines

no code implementations28 Mar 2018 Jon Barker, Shinji Watanabe, Emmanuel Vincent, Jan Trmal

The CHiME challenge series aims to advance robust automatic speech recognition (ASR) technology by promoting research at the interface of speech and language processing, signal processing , and machine learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments

1 code implementation1 Jul 2017 Ziteng Wang, Emmanuel Vincent, Romain Serizel, Yonghong Yan

Multichannel linear filters, such as the Multichannel Wiener Filter (MWF) and the Generalized Eigenvalue (GEV) beamformer are popular signal processing techniques which can improve speech recognition performance.

speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.