Speaker Recognition

90 papers with code • 1 benchmarks • 6 datasets

Speaker Recognition is the process of identifying or confirming the identity of a person given his speech segments.

Source: Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

Libraries

Use these libraries to find Speaker Recognition models and implementations

TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement

yunyangzeng/taploss 16 Feb 2023

We propose an objective for perceptual quality based on temporal acoustic parameters.

62
16 Feb 2023

Masking Kernel for Learning Energy-Efficient Representations for Speaker Recognition and Mobile Health

aditthapron/windowmasking 8 Feb 2023

The proposed approach minimizes the energy consumption of both data collection and inference by 57%, and is competitive with speaker recognition and traumatic brain injury detection baselines.

1
08 Feb 2023

OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset

iip-sogang/olkavs-avspeech 16 Jan 2023

Inspired by humans comprehending speech in a multi-modal manner, various audio-visual datasets have been constructed.

26
16 Jan 2023

Inconsistency Ranking-based Noisy Label Detection for High-quality Data

a43992899/noisyspeakerdetection 1 Dec 2022

We apply this technique to the automatic speaker verification (ASV) task as a proof of concept.

7
01 Dec 2022

Is Style All You Need? Dependencies Between Emotion and GST-based Speaker Recognition

morganlee123/deeptalkemotions 15 Nov 2022

On the task of speech emotion detection, we obtain 80. 8% ACC with acted emotion samples from CREMA-D, 81. 2% ACC with semi-natural emotion samples in IEMOCAP, and 66. 9% ACC with natural emotion samples in MSP-Podcast.

5
15 Nov 2022

Speaker recognition with two-step multi-modal deep cleansing

taoruijie/avcleanse 28 Oct 2022

However, noisy samples (i. e., with wrong labels) in the training set induce confusion and cause the network to learn the incorrect representation.

26
28 Oct 2022

Toroidal Probabilistic Spherical Discriminant Analysis

bsxfan/PSDA 27 Oct 2022

It extends PSDA with the ability to model within and between-speaker variabilities in toroidal submanifolds of the hypersphere.

12
27 Oct 2022

Risk of re-identification for shared clinical speech recordings

neurology-ai-program/speech_risk 18 Oct 2022

Risk is high for a small search space but drops as the search space grows ($precision >0. 85$ for $<1*10^{6}$ comparisons, $precision <0. 5$ for $>3*10^{6}$ comparisons).

0
18 Oct 2022

Towards Understanding and Mitigating Audio Adversarial Examples for Speaker Recognition

SEC4SR/SEC4SR 7 Jun 2022

According to the characteristic of SRSs, we present 22 diverse transformations and thoroughly evaluate them using 7 recent promising adversarial attacks (4 white-box and 3 black-box) on speaker recognition.

25
07 Jun 2022

Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts

deeplsd/merkel-podcast-corpus 24 May 2022

We introduce the Merkel Podcast Corpus, an audio-visual-text corpus in German collected from 16 years of (almost) weekly Internet podcasts of former German chancellor Angela Merkel.

11
24 May 2022