Speaker Recognition
90 papers with code • 1 benchmarks • 6 datasets
Speaker Recognition is the process of identifying or confirming the identity of a person given his speech segments.
Source: Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition
Libraries
Use these libraries to find Speaker Recognition models and implementationsDatasets
Latest papers
3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization
This paper introduces 3D-Speaker-Toolkit, an open source toolkit for multi-modal speaker verification and diarization.
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
First, we provide an open-source platform for researchers in the speaker recognition community to effortlessly build models.
Take the aTrain. Introducing an Interface for the Accessible Transcription of Interviews
If an entry-level graphics card is available, the transcription speed increases to 20% of the audio duration.
Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition
Current speaker recognition systems primarily rely on supervised approaches, constrained by the scale of labeled datasets.
SLMIA-SR: Speaker-Level Membership Inference Attacks against Speaker Recognition Systems
Our attack is versatile and can work in both white-box and black-box scenarios.
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Self-supervised learning (SSL) for speech representation has been successfully applied in various downstream tasks, such as speech and speaker recognition.
Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?
Self-supervised learning (SSL) models use only the intrinsic structure of a given signal, independent of its acoustic domain, to extract essential information from the input to an embedding space.
Vocal Style Factorization for Effective Speaker Recognition in Affective Scenarios
The accuracy of automated speaker recognition is negatively impacted by change in emotions in a person's speech.
VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge
This paper summarises the findings from the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22), which was held in conjunction with INTERSPEECH 2022.
Probabilistic Back-ends for Online Speaker Recognition and Clustering
This paper focuses on multi-enrollment speaker recognition which naturally occurs in the task of online speaker clustering, and studies the properties of different scoring back-ends in this scenario.