Speaker Recognition

90 papers with code • 1 benchmarks • 6 datasets

Speaker Recognition is the process of identifying or confirming the identity of a person given his speech segments.

Source: Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

Libraries

Use these libraries to find Speaker Recognition models and implementations

Most implemented papers

Toroidal Probabilistic Spherical Discriminant Analysis

bsxfan/toroidal-psda 27 Oct 2022

It extends PSDA with the ability to model within and between-speaker variabilities in toroidal submanifolds of the hypersphere.

TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement

yunyangzeng/taploss 16 Feb 2023

We propose an objective for perceptual quality based on temporal acoustic parameters.

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models

espnet/espnet 30 Jan 2024

First, we provide an open-source platform for researchers in the speaker recognition community to effortlessly build models.

Unified Hypersphere Embedding for Speaker Recognition

MahdiHajibabaei/unified-embedding 22 Jul 2018

Incremental improvements in accuracy of Convolutional Neural Networks are usually achieved through use of deeper and more complex models trained on larger datasets.

Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model

Splinter0/CoughCNN 12 Sep 2018

In this paper, we propose a Convolutional Neural Network (CNN) based speaker recognition model for extracting robust speaker embeddings.

Additive Margin SincNet for Speaker Recognition

joaoantoniocn/AM-SincNet 28 Jan 2019

The Softmax loss function is a widely used function in deep learning methods, but it is not the best choice for all kind of problems.

BERTphone: Phonetically-Aware Encoder Representations for Utterance-Level Speaker and Language Recognition

awslabs/speech-representations 30 Jun 2019

We introduce BERTphone, a Transformer encoder trained on large speech corpora that outputs phonetically-aware contextual representation vectors that can be used for both speaker and language recognition.

Latent space representation for multi-target speaker detection and identification with a sparse dataset using Triplet neural networks

KinWaiCheuk/MCE2018 1 Oct 2019

When reducing the training data to only using the train set, our method results in 309 confusions for the Multi-target speaker identification task, which is 46% better than the baseline model.

Delving into VoxCeleb: environment invariant speaker recognition

theolepage/sslsv 24 Oct 2019

Research in speaker recognition has recently seen significant progress due to the application of neural network models and the availability of new large-scale datasets.