Speaker Identification
61 papers with code • 4 benchmarks • 4 datasets
Latest papers with no code
TIMIT Speaker Profiling: A Comparison of Multi-task learning and Single-task learning Approaches
This study employs deep learning techniques to explore four speaker profiling tasks on the TIMIT dataset, namely gender classification, accent classification, age estimation, and speaker identification, highlighting the potential and challenges of multi-task learning versus single-task models.
Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling
Recently, there have been efforts to encode the linguistic information of speech using a self-supervised framework for speech synthesis.
Neural Networks Hear You Loud And Clear: Hearing Loss Compensation Using Deep Neural Networks
In this study, we propose a DNN-based approach for hearing-loss compensation, which is trained on the outputs of hearing-impaired and normal-hearing DNN-based auditory models in response to speech signals.
A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement
Self-supervised learned models have been found to be very effective for certain speech tasks such as automatic speech recognition, speaker identification, keyword spotting and others.
Unraveling Adversarial Examples against Speaker Identification -- Techniques for Attack Detection and Victim Model Classification
In this paper, we propose a method to detect the presence of adversarial examples, i. e., a binary classifier distinguishing between benign and adversarial examples.
Effect of utterance duration and phonetic content on speaker identification using second-order statistical methods
The goal is to investigate on the kind of information which is used by these methods, and where it is located in the speech signal.
Significance of Chirp MFCC as a Feature in Speech and Audio Applications
A novel feature, based on the chirp z-transform, that offers an improved representation of the underlying true spectrum is proposed.
Probing Self-supervised Learning Models with Target Speech Extraction
TSE uniquely requires both speaker identification and speech separation, distinguishing it from other tasks in the Speech processing Universal PERformance Benchmark (SUPERB) evaluation.
Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis
This paper proposes a speech rhythm-based method for speaker embeddings to model phoneme duration using a few utterances by the target speaker.
Post-Training Embedding Alignment for Decoupling Enrollment and Runtime Speaker Recognition Models
Automated speaker identification (SID) is a crucial step for the personalization of a wide range of speech-enabled services.