Speaker Recognition

90 papers with code • 1 benchmarks • 6 datasets

Speaker Recognition is the process of identifying or confirming the identity of a person given his speech segments.

Source: Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

Benchmarks

Add a Result

These leaderboards are used to track progress in Speaker Recognition

Trend	Dataset	Best Model	Paper	Code	Compare
	VoxCeleb1	WavLM+ECAPA-TDNN			See all

Libraries

Use these libraries to find Speaker Recognition models and implementations

s3prl/s3prl

2 papers

2,104

andi611/Self-Supervised-Speech-Pret…

2 papers

2,104

Jungjee/RawNet

2 papers

332

Datasets

Latest papers

Most implemented Social Latest No code

TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement

yunyangzeng/taploss • • 16 Feb 2023

We propose an objective for perceptual quality based on temporal acoustic parameters.

16 Feb 2023

Paper
Code

Masking Kernel for Learning Energy-Efficient Representations for Speaker Recognition and Mobile Health

aditthapron/windowmasking • • 8 Feb 2023

The proposed approach minimizes the energy consumption of both data collection and inference by 57%, and is competitive with speaker recognition and traumatic brain injury detection baselines.

08 Feb 2023

Paper
Code

OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset

iip-sogang/olkavs-avspeech • • 16 Jan 2023

Inspired by humans comprehending speech in a multi-modal manner, various audio-visual datasets have been constructed.

16 Jan 2023

Paper
Code

Inconsistency Ranking-based Noisy Label Detection for High-quality Data

a43992899/noisyspeakerdetection • • 1 Dec 2022

We apply this technique to the automatic speaker verification (ASV) task as a proof of concept.

01 Dec 2022

Paper
Code

Is Style All You Need? Dependencies Between Emotion and GST-based Speaker Recognition

morganlee123/deeptalkemotions • • 15 Nov 2022

On the task of speech emotion detection, we obtain 80. 8% ACC with acted emotion samples from CREMA-D, 81. 2% ACC with semi-natural emotion samples in IEMOCAP, and 66. 9% ACC with natural emotion samples in MSP-Podcast.

15 Nov 2022

Paper
Code

Speaker recognition with two-step multi-modal deep cleansing

taoruijie/avcleanse • • 28 Oct 2022

However, noisy samples (i. e., with wrong labels) in the training set induce confusion and cause the network to learn the incorrect representation.

28 Oct 2022

Paper
Code

Toroidal Probabilistic Spherical Discriminant Analysis

bsxfan/PSDA • 27 Oct 2022

It extends PSDA with the ability to model within and between-speaker variabilities in toroidal submanifolds of the hypersphere.

27 Oct 2022

Paper
Code

Risk of re-identification for shared clinical speech recordings

neurology-ai-program/speech_risk • • 18 Oct 2022

Risk is high for a small search space but drops as the search space grows ($precision >0. 85$ for $<1*10^{6}$ comparisons, $precision <0. 5$ for $>3*10^{6}$ comparisons).

18 Oct 2022

Paper
Code

Towards Understanding and Mitigating Audio Adversarial Examples for Speaker Recognition

SEC4SR/SEC4SR • • 7 Jun 2022

According to the characteristic of SRSs, we present 22 diverse transformations and thoroughly evaluate them using 7 recent promising adversarial attacks (4 white-box and 3 black-box) on speaker recognition.

07 Jun 2022

Paper
Code

Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts

deeplsd/merkel-podcast-corpus • 24 May 2022

We introduce the Merkel Podcast Corpus, an audio-visual-text corpus in German collected from 16 years of (almost) weekly Internet podcasts of former German chancellor Angela Merkel.

24 May 2022

Paper
Code

Speaker Recognition

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result