Speaker Recognition
90 papers with code • 1 benchmarks • 6 datasets
Speaker Recognition is the process of identifying or confirming the identity of a person given his speech segments.
Source: Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition
Libraries
Use these libraries to find Speaker Recognition models and implementationsDatasets
Latest papers with no code
TIMIT Speaker Profiling: A Comparison of Multi-task learning and Single-task learning Approaches
This study employs deep learning techniques to explore four speaker profiling tasks on the TIMIT dataset, namely gender classification, accent classification, age estimation, and speaker identification, highlighting the potential and challenges of multi-task learning versus single-task models.
Voice Conversion Augmentation for Speaker Recognition on Defective Datasets
Our experimental results on three created datasets demonstrated that VCA-NN effectively mitigates these dataset problems, which provides a new direction for handling the speaker recognition problems from the data aspect.
Asymmetric and trial-dependent modeling: the contribution of LIA to SdSV Challenge Task 2
The SdSv challenge Task 2 provided an opportunity to assess efficiency and robustness of modern text-independent speaker verification systems.
Cosine Scoring with Uncertainty for Neural Speaker Embedding
Uncertainty modeling in speaker representation aims to learn the variability present in speech utterances.
Post-Training Embedding Alignment for Decoupling Enrollment and Runtime Speaker Recognition Models
Automated speaker identification (SID) is a crucial step for the personalization of a wide range of speech-enabled services.
Voxceleb-ESP: preliminary experiments detecting Spanish celebrities from their voices
This paper presents VoxCeleb-ESP, a collection of pointers and timestamps to YouTube videos facilitating the creation of a novel speaker recognition dataset.
Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes
From the publicly available speech dataset LibriTTS, we also created a separate database of only audio deepfakes LibriTTS-DF using several latest text to speech methods: YourTTS, Adaspeech, and TorToiSe.
Phonetic-aware speaker embedding for far-field speaker verification
The intuition is that phonetic information can preserve low-level acoustic dynamics with speaker information and thus partly compensate for the degradation due to noise and reverberation.
Parrot-Trained Adversarial Examples: Pushing the Practicality of Black-Box Audio Attacks against Speaker Recognition Models
Motivated by recent advancements in voice conversion (VC), we propose to use the one short sentence knowledge to generate more synthetic speech samples that sound like the target speaker, called parrot speech.
Detecting Agreement in Multi-party Conversational AI
Today, conversational systems are expected to handle conversations in multi-party settings, especially within Socially Assistive Robots (SARs).