Speaker Recognition
90 papers with code • 1 benchmarks • 6 datasets
Speaker Recognition is the process of identifying or confirming the identity of a person given his speech segments.
Source: Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition
Libraries
Use these libraries to find Speaker Recognition models and implementationsDatasets
Most implemented papers
Attention-Based Models for Text-Dependent Speaker Verification
Attention-based models have recently shown great performance on a range of tasks, such as speech recognition, machine translation, and image captioning due to their ability to summarize relevant information that expands through the entire length of an input sequence.
VoxCeleb2: Deep Speaker Recognition
The objective of this paper is speaker recognition under noisy and unconstrained conditions.
Speech and Speaker Recognition from Raw Waveform with SincNet
Deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones.
Personal VAD: Speaker-Conditioned Voice Activity Detection
In this paper, we propose "personal VAD", a system to detect the voice activity of a target speaker at the frame level.
Filterbank design for end-to-end speech separation
Also, we validate the use of parameterized filterbanks and show that complex-valued representations and masks are beneficial in all conditions.
CN-CELEB: a challenging Chinese speaker recognition dataset
These datasets tend to deliver over optimistic performance and do not meet the request of research on speaker recognition in unconstrained conditions.
Speech2Phone: A Novel and Efficient Method for Training Speaker Recognition Models
We compare the three best architectures trained using our method to select the best one, which is the one with a shallow architecture.
Crossed-Time Delay Neural Network for Speaker Recognition
Time Delay Neural Network (TDNN) is a well-performing structure for DNN-based speaker recognition systems.
Speaker anonymisation using the McAdams coefficient
Anonymisation has the goal of manipulating speech signals in order to degrade the reliability of automatic approaches to speaker recognition, while preserving other aspects of speech, such as those relating to intelligibility and naturalness.
Pushing the limits of raw waveform speaker recognition
Our best model achieves an equal error rate of 0. 89%, which is competitive with the state-of-the-art models based on handcrafted features, and outperforms the best model based on raw waveform inputs by a large margin.