Speaker Identification
61 papers with code • 4 benchmarks • 4 datasets
Latest papers
Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input
We propose a new method, Masked Modeling Duo (M2D), that learns representations directly while obtaining training signals using only masked patches.
Cross-Lingual Speaker Identification Using Distant Supervision
Speaker identification, determining which character said each utterance in literary text, benefits many downstream tasks.
IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languages
We hope IndicSUPERB contributes to the progress of developing speech language understanding models for Indian languages.
Masked Autoencoders that Listen
Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers.
Extended U-Net for Speaker Verification in Noisy Environments
Background noise is a well-known factor that deteriorates the accuracy and reliability of speaker verification (SV) systems by blurring speech intelligibility.
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
PaddleSpeech is an open-source all-in-one speech toolkit.
EVI: Multilingual Spoken Dialogue Tasks and Dataset for Knowledge-Based Enrolment, Verification, and Identification
Knowledge-based authentication is crucial for task-oriented spoken dialogue systems that offer personalised and privacy-focused services.
ATST: Audio Representation Learning with Teacher-Student Transformer
Self-supervised learning (SSL) learns knowledge from a large amount of unlabeled data, and then transfers the knowledge to a specific problem with a limited number of labeled data.
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings
The proposed speaker embedding, named t-vector, is extracted synchronously with the t-SOT ASR model, enabling joint execution of speaker identification (SID) or speaker diarization (SD) with the multi-talker transcription with low latency.
SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech
Historically these have focused on automatic speech recognition (ASR), speaker identification, or other lower-level tasks.