Speaker Verification
170 papers with code • 5 benchmarks • 6 datasets
Speaker verification is the verifying the identity of a person from characteristics of the voice.
( Image credit: Contrastive-Predictive-Coding-PyTorch )
Libraries
Use these libraries to find Speaker Verification models and implementationsLatest papers
3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization
This paper introduces 3D-Speaker-Toolkit, an open source toolkit for multi-modal speaker verification and diarization.
a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification
Spoofing detection is today a mainstream research topic.
ChildAugment: Data Augmentation Methods for Zero-Resource Children's Speaker Verification
One promising approach is to align vocal-tract parameters between adults and children through children-specific data augmentation, referred here to as ChildAugment.
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
First, we provide an open-source platform for researchers in the speaker recognition community to effortlessly build models.
Singer Identity Representation Learning using Self-Supervised Techniques
Significant strides have been made in creating voice identity representations using speech data.
NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for Speaker Verification
Meanwhile, in vision tasks, ConvNet structures have been modernized by referring to Transformer, resulting in improved performance.
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification
We represent the stride space on a trellis diagram, and conduct a systematic study on the impact of temporal and frequency resolutions on the performance and further identify two optimal points, namely Golden Gemini, which serves as a guiding principle for designing 2D ResNet-based speaker verification models.
Learning Repeatable Speech Embeddings Using An Intra-class Correlation Regularizer
A good supervised embedding for a specific machine learning task is only sensitive to changes in the label of interest and is invariant to other confounding factors.
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Hearing is arguably an essential ability of artificial intelligence (AI) agents in the physical world, which refers to the perception and understanding of general auditory information consisting of at least three types of sounds: speech, audio events, and music.
Pairwise Similarity Learning is SimPLE
In this paper, we focus on a general yet important learning problem, pairwise similarity learning (PSL).