Speaker Verification
170 papers with code • 5 benchmarks • 6 datasets
Speaker verification is the verifying the identity of a person from characteristics of the voice.
( Image credit: Contrastive-Predictive-Coding-PyTorch )
Libraries
Use these libraries to find Speaker Verification models and implementationsMost implemented papers
End-to-End Text-Dependent Speaker Verification
In this paper we present a data-driven, integrated approach to speaker verification, which maps a test utterance and a few reference utterances directly to a single score for verification and jointly optimizes the system's components using the same evaluation protocol and metric as at test time.
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data
We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision.
rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method
In the end, a posteriori SNR weighted energy difference is applied to the extended pitch segments of the denoised speech signal for detecting voice activity.
Ludwig: a type-based declarative deep learning toolbox
In this work we present Ludwig, a flexible, extensible and easy to use toolbox which allows users to train deep learning models and use them for obtaining predictions without writing code.
AutoSpeech: Neural Architecture Search for Speaker Recognition
Speaker recognition systems based on Convolutional Neural Networks (CNNs) are often built with off-the-shelf backbones such as VGG-Net or ResNet.
One-class learning towards generalized voice spoofing detection
Human voices can be used to authenticate the identity of the speaker, but the automatic speaker verification (ASV) systems are vulnerable to voice spoofing attacks, such as impersonation, replay, text-to-speech, and voice conversion.
An Empirical Study on Channel Effects for Synthetic Voice Spoofing Countermeasure Systems
Spoofing countermeasure (CM) systems are critical in speaker verification; they aim to discern spoofing attacks from bona fide speech trials.
3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition
We propose the use of a coupled 3D Convolutional Neural Network (3D-CNN) architecture that can map both modalities into a representation space to evaluate the correspondence of audio-visual streams using the learned multimodal features.
Attention-Based Models for Text-Dependent Speaker Verification
Attention-based models have recently shown great performance on a range of tasks, such as speech recognition, machine translation, and image captioning due to their ability to summarize relevant information that expands through the entire length of an input sequence.
Scalable Factorized Hierarchical Variational Autoencoder Training
Deep generative models have achieved great success in unsupervised learning with the ability to capture complex nonlinear relationships between latent generating factors and observations.