Speaker Identification

61 papers with code • 4 benchmarks • 4 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Speaker Recognition from Raw Waveform with SincNet

mravanelli/SincNet 29 Jul 2018

Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from waveforms, potentially allowing the network to better capture important narrow-band speaker characteristics such as pitch and formants.

Deep Speaker: an End-to-End Neural Speaker Embedding System

philipperemy/deep-speaker 5 May 2017

We present Deep Speaker, a neural speaker embedding system that maps utterances to a hypersphere where speaker similarity is measured by cosine similarity.

ATST: Audio Representation Learning with Teacher-Student Transformer

Audio-WestlakeU/audiossl 26 Apr 2022

Self-supervised learning (SSL) learns knowledge from a large amount of unlabeled data, and then transfers the knowledge to a specific problem with a limited number of labeled data.

Masked Autoencoders that Listen

facebookresearch/audiomae 13 Jul 2022

Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers.

AM-MobileNet1D: A Portable Model for Speaker Recognition

joaoantoniocn/AM-MobileNet1D 31 Mar 2020

To address this demand, we propose a portable model called Additive Margin MobileNet1D (AM-MobileNet1D) to Speaker Identification on mobile devices.

AutoSpeech: Neural Architecture Search for Speaker Recognition

TAMU-VITA/AutoSpeech 7 May 2020

Speaker recognition systems based on Convolutional Neural Networks (CNNs) are often built with off-the-shelf backbones such as VGG-Net or ResNet.

Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation

andi611/Self-Supervised-Speech-Pretraining-and-Representation-Learning 18 May 2020

We use the representations with two downstream tasks, speaker identification, and phoneme classification.

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

microsoft/speecht5 ACL 2022

Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning.

Learning Speaker Representations with Mutual Information

Js-Mim/rl_singing_voice 1 Dec 2018

Mutual Information (MI) or similar measures of statistical dependence are promising tools for learning these representations in an unsupervised way.