Speaker Recognition
90 papers with code • 1 benchmarks • 6 datasets
Speaker Recognition is the process of identifying or confirming the identity of a person given his speech segments.
Source: Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition
Libraries
Use these libraries to find Speaker Recognition models and implementationsDatasets
Latest papers
TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement
We propose an objective for perceptual quality based on temporal acoustic parameters.
Masking Kernel for Learning Energy-Efficient Representations for Speaker Recognition and Mobile Health
The proposed approach minimizes the energy consumption of both data collection and inference by 57%, and is competitive with speaker recognition and traumatic brain injury detection baselines.
OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset
Inspired by humans comprehending speech in a multi-modal manner, various audio-visual datasets have been constructed.
Inconsistency Ranking-based Noisy Label Detection for High-quality Data
We apply this technique to the automatic speaker verification (ASV) task as a proof of concept.
Is Style All You Need? Dependencies Between Emotion and GST-based Speaker Recognition
On the task of speech emotion detection, we obtain 80. 8% ACC with acted emotion samples from CREMA-D, 81. 2% ACC with semi-natural emotion samples in IEMOCAP, and 66. 9% ACC with natural emotion samples in MSP-Podcast.
Speaker recognition with two-step multi-modal deep cleansing
However, noisy samples (i. e., with wrong labels) in the training set induce confusion and cause the network to learn the incorrect representation.
Toroidal Probabilistic Spherical Discriminant Analysis
It extends PSDA with the ability to model within and between-speaker variabilities in toroidal submanifolds of the hypersphere.
Risk of re-identification for shared clinical speech recordings
Risk is high for a small search space but drops as the search space grows ($precision >0. 85$ for $<1*10^{6}$ comparisons, $precision <0. 5$ for $>3*10^{6}$ comparisons).
Towards Understanding and Mitigating Audio Adversarial Examples for Speaker Recognition
According to the characteristic of SRSs, we present 22 diverse transformations and thoroughly evaluate them using 7 recent promising adversarial attacks (4 white-box and 3 black-box) on speaker recognition.
Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts
We introduce the Merkel Podcast Corpus, an audio-visual-text corpus in German collected from 16 years of (almost) weekly Internet podcasts of former German chancellor Angela Merkel.