Speaker Identification

61 papers with code • 4 benchmarks • 4 datasets

This task has no description! Would you like to contribute one?

Latest papers with no code

Voxceleb-ESP: preliminary experiments detecting Spanish celebrities from their voices

no code yet • 20 Dec 2023

This paper presents VoxCeleb-ESP, a collection of pointers and timestamps to YouTube videos facilitating the creation of a novel speaker recognition dataset.

Efficiency-oriented approaches for self-supervised speech representation learning

no code yet • 18 Dec 2023

Self-supervised learning enables the training of large neural models without the need for large, labeled datasets.

Privacy-preserving Representation Learning for Speech Understanding

no code yet • 26 Oct 2023

In this paper, we present a novel framework to anonymize utterance-level speech embeddings generated by pre-trained encoders and show its effectiveness for a range of speech classification tasks.

Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition

no code yet • 17 Oct 2023

In this study, embeddings from advanced pre-trained language identification (LID) and speaker identification (SID) models are leveraged to improve the accuracy of accent classification and non-native accentedness assessment.

End-to-end Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis

no code yet • 16 Oct 2023

We present an end-to-end multichannel speaker-attributed automatic speech recognition (MC-SA-ASR) system that combines a Conformer-based encoder with multi-frame crosschannel attention and a speaker-attributed Transformer-based decoder.

Test-Time Training for Speech

no code yet • 19 Sep 2023

In this paper, we study the application of Test-Time Training (TTT) as a solution to handling distribution shifts in speech applications.

Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks

no code yet • 18 Sep 2023

Brain-inspired spiking neural networks (SNNs) have demonstrated great potential for temporal signal processing.

Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction

no code yet • 7 Sep 2023

This study provides an empirical analysis of Barlow Twins (BT), an SSL technique inspired by theories of redundancy reduction in human perception.

Read, Look or Listen? What's Needed for Solving a Multimodal Dataset

no code yet • 6 Jul 2023

We propose a two-step method to analyze multimodal datasets, which leverages a small seed of human annotation to map each multimodal instance to the modalities required to process it.

VoxWatch: An open-set speaker recognition benchmark on VoxCeleb

no code yet • 30 Jun 2023

Prior studies on this problem are sparse, and lack a common benchmark for systematic evaluations.