Search Results for author: Jesús Villalba

Found 26 papers, 6 papers with code

Unraveling Adversarial Examples against Speaker Identification -- Techniques for Attack Detection and Victim Model Classification

no code implementations29 Feb 2024 Sonal Joshi, Thomas Thebaud, Jesús Villalba, Najim Dehak

In this paper, we propose a method to detect the presence of adversarial examples, i. e., a binary classifier distinguishing between benign and adversarial examples.

Adversarial Attack Classification +1

Leveraging Pretrained Image-text Models for Improving Audio-Visual Learning

no code implementations8 Sep 2023 Saurabhchand Bhati, Jesús Villalba, Laureano Moro-Velazquez, Thomas Thebaud, Najim Dehak

Cascaded SpeechCLIP attempted to generate localized word-level information and utilize both the pretrained image and text encoders.

audio-visual learning Quantization +1

Regularizing Contrastive Predictive Coding for Speech Applications

no code implementations12 Apr 2023 Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

These representations significantly reduce the amount of labeled data needed for downstream task performance, such as automatic speech recognition.

Acoustic Unit Discovery Automatic Speech Recognition +3

Time-domain speech super-resolution with GAN based modeling for telephony speaker verification

no code implementations4 Sep 2022 Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Piotr Żelasko, Najim Dehak

We show that our bandwidth extension leads to phenomena such as a shift of telephone (test) embeddings towards wideband (train) signals, a negative correlation of perceptual quality with downstream performance, and condition-independent score calibration.

Bandwidth Extension Data Augmentation +3

Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification

no code implementations30 Mar 2022 Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Najim Dehak

Then, we propose a two-stage learning solution where we use a pre-trained domain adaptation system for pre-processing in bandwidth extension training.

Bandwidth Extension Domain Adaptation +1

Beyond Isolated Utterances: Conversational Emotion Recognition

no code implementations13 Sep 2021 Raghavendra Pappagari, Piotr Żelasko, Jesús Villalba, Laureano Moro-Velazquez, Najim Dehak

While most of the current approaches focus on inferring emotion from isolated utterances, we argue that this is not sufficient to achieve conversational emotion recognition (CER) which deals with recognizing emotions in conversations.

Speech Emotion Recognition

Representation Learning to Classify and Detect Adversarial Attacks against Speaker and Speech Recognition Systems

no code implementations9 Jul 2021 Jesús Villalba, Sonal Joshi, Piotr Żelasko, Najim Dehak

Also, representations trained to classify attacks against speaker identification can be used also to classify attacks against speaker verification and speech recognition.

Representation Learning Speaker Identification +4

Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation

no code implementations3 Jun 2021 Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

We overcome this limitation with a segmental contrastive predictive coding (SCPC) framework that can model the signal structure at a higher level e. g. at the phoneme level.

Study of Pre-processing Defenses against Adversarial Attacks on State-of-the-art Speaker Recognition Systems

no code implementations22 Jan 2021 Sonal Joshi, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velázquez, Najim Dehak

Such attacks pose severe security risks, making it vital to deep-dive and understand how much the state-of-the-art SR systems are vulnerable to these attacks.

Speaker Recognition

Focus on the present: a regularization method for the ASR source-target attention layer

no code implementations2 Nov 2020 Nanxin Chen, Piotr Żelasko, Jesús Villalba, Najim Dehak

This paper introduces a novel method to diagnose the source-target attention in state-of-the-art end-to-end speech recognition models with joint connectionist temporal classification (CTC) and attention training.

speech-recognition Speech Recognition

Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery

no code implementations26 Jul 2020 Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Najim Dehak

We perform segmentation based on the assumption that the frame feature vectors are more similar within a segment than across the segments.

Segmentation

Low-Resource Domain Adaptation for Speaker Recognition Using Cycle-GANs

1 code implementation25 Oct 2019 Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Najim Dehak

We experiment with two adaptation tasks: microphone to telephone and a novel reverberant to clean adaptation with the end goal of improving speaker recognition performance.

Audio and Speech Processing Sound

Unsupervised Feature Enhancement for speaker verification

1 code implementation25 Oct 2019 Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Paola García-Perera, Najim Dehak

The approach yielded significant improvements on both real and simulated sets when data augmentation was not used in speaker verification pipeline or augmentation was used only during x-vector training.

Audio and Speech Processing Sound

Hierarchical Transformers for Long Document Classification

3 code implementations23 Oct 2019 Raghavendra Pappagari, Piotr Żelasko, Jesús Villalba, Yishay Carmiel, Najim Dehak

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm.

Classification Document Classification +3

ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks

1 code implementation1 Apr 2019 Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak

We present JHU's system submission to the ASVspoof 2019 Challenge: Anti-Spoofing with Squeeze-Excitation and Residual neTworks (ASSERT).

Feature Engineering Voice Conversion

Unsupervised Adaptation of SPLDA

no code implementations20 Nov 2015 Jesús Villalba

We describe a generative model that produces both sets of data where the unknown labels are modeled like latent variables.

speaker-diarization Speaker Diarization +1

Bayesian SPLDA

no code implementations20 Nov 2015 Jesús Villalba

This can be used to adapt SPLDA from one database to another with few development data or to implement the fully Bayesian recipe.

Variational Bayes Factor Analysis for i-Vector Extraction

no code implementations20 Nov 2015 Jesús Villalba

In this document we are going to derive the equations needed to implement a Variational Bayes i-vector extractor.

PLDA with Two Sources of Inter-session Variability

no code implementations20 Nov 2015 Jesús Villalba

This model was applied in the paper "Handling Recordings Acquired Simultaneously over Multiple Channels with PLDA" published at Interspeech 2013.

Speaker Recognition Vocal Bursts Valence Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.