Speaker Recognition

90 papers with code • 1 benchmarks • 6 datasets

Speaker Recognition is the process of identifying or confirming the identity of a person given his speech segments.

Source: Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

Libraries

Use these libraries to find Speaker Recognition models and implementations

Latest papers with no code

TIMIT Speaker Profiling: A Comparison of Multi-task learning and Single-task learning Approaches

no code yet • 18 Apr 2024

This study employs deep learning techniques to explore four speaker profiling tasks on the TIMIT dataset, namely gender classification, accent classification, age estimation, and speaker identification, highlighting the potential and challenges of multi-task learning versus single-task models.

Voice Conversion Augmentation for Speaker Recognition on Defective Datasets

no code yet • 1 Apr 2024

Our experimental results on three created datasets demonstrated that VCA-NN effectively mitigates these dataset problems, which provides a new direction for handling the speaker recognition problems from the data aspect.

Asymmetric and trial-dependent modeling: the contribution of LIA to SdSV Challenge Task 2

no code yet • 28 Mar 2024

The SdSv challenge Task 2 provided an opportunity to assess efficiency and robustness of modern text-independent speaker verification systems.

Cosine Scoring with Uncertainty for Neural Speaker Embedding

no code yet • 11 Mar 2024

Uncertainty modeling in speaker representation aims to learn the variability present in speech utterances.

Post-Training Embedding Alignment for Decoupling Enrollment and Runtime Speaker Recognition Models

no code yet • 23 Jan 2024

Automated speaker identification (SID) is a crucial step for the personalization of a wide range of speech-enabled services.

Voxceleb-ESP: preliminary experiments detecting Spanish celebrities from their voices

no code yet • 20 Dec 2023

This paper presents VoxCeleb-ESP, a collection of pointers and timestamps to YouTube videos facilitating the creation of a novel speaker recognition dataset.

Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes

no code yet • 29 Nov 2023

From the publicly available speech dataset LibriTTS, we also created a separate database of only audio deepfakes LibriTTS-DF using several latest text to speech methods: YourTTS, Adaspeech, and TorToiSe.

Phonetic-aware speaker embedding for far-field speaker verification

no code yet • 27 Nov 2023

The intuition is that phonetic information can preserve low-level acoustic dynamics with speaker information and thus partly compensate for the degradation due to noise and reverberation.

Parrot-Trained Adversarial Examples: Pushing the Practicality of Black-Box Audio Attacks against Speaker Recognition Models

no code yet • 13 Nov 2023

Motivated by recent advancements in voice conversion (VC), we propose to use the one short sentence knowledge to generate more synthetic speech samples that sound like the target speaker, called parrot speech.

Detecting Agreement in Multi-party Conversational AI

no code yet • 6 Nov 2023

Today, conversational systems are expected to handle conversations in multi-party settings, especially within Socially Assistive Robots (SARs).