Speaker Identification
61 papers with code • 4 benchmarks • 4 datasets
Latest papers with no code
Voxceleb-ESP: preliminary experiments detecting Spanish celebrities from their voices
This paper presents VoxCeleb-ESP, a collection of pointers and timestamps to YouTube videos facilitating the creation of a novel speaker recognition dataset.
Efficiency-oriented approaches for self-supervised speech representation learning
Self-supervised learning enables the training of large neural models without the need for large, labeled datasets.
Privacy-preserving Representation Learning for Speech Understanding
In this paper, we present a novel framework to anonymize utterance-level speech embeddings generated by pre-trained encoders and show its effectiveness for a range of speech classification tasks.
Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition
In this study, embeddings from advanced pre-trained language identification (LID) and speaker identification (SID) models are leveraged to improve the accuracy of accent classification and non-native accentedness assessment.
End-to-end Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis
We present an end-to-end multichannel speaker-attributed automatic speech recognition (MC-SA-ASR) system that combines a Conformer-based encoder with multi-frame crosschannel attention and a speaker-attributed Transformer-based decoder.
Test-Time Training for Speech
In this paper, we study the application of Test-Time Training (TTT) as a solution to handling distribution shifts in speech applications.
Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks
Brain-inspired spiking neural networks (SNNs) have demonstrated great potential for temporal signal processing.
Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction
This study provides an empirical analysis of Barlow Twins (BT), an SSL technique inspired by theories of redundancy reduction in human perception.
Read, Look or Listen? What's Needed for Solving a Multimodal Dataset
We propose a two-step method to analyze multimodal datasets, which leverages a small seed of human annotation to map each multimodal instance to the modalities required to process it.
VoxWatch: An open-set speaker recognition benchmark on VoxCeleb
Prior studies on this problem are sparse, and lack a common benchmark for systematic evaluations.