Speaker Separation
11 papers with code • 0 benchmarks • 3 datasets
Benchmarks
These leaderboards are used to track progress in Speaker Separation
Latest papers with no code
Learning-based Robust Speaker Counting and Separation with the Aid of Spatial Coherence
The global activity functions of each speaker are estimated from a simplex constructed using the eigenvectors of the SCM, while the local coherence functions are computed from the coherence between the wRTFs of a time-frequency bin and the global activity function-weighted RTF of the target speaker.
Online Binaural Speech Separation of Moving Speakers With a Wavesplit Network
Binaural speech separation in real-world scenarios often involves moving speakers.
Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge
To address the challenges encountered in the CEC2 setting, we introduce four major novelties: (1) we extend the state-of-the-art TF-GridNet model, originally designed for monaural speaker separation, for multi-channel, causal speech enhancement, and large improvements are observed by replacing the TCNDenseNet used in iNeuBe with this new architecture; (2) we leverage a recent dual window size approach with future-frame prediction to ensure that iNueBe-X satisfies the 5 ms constraint on algorithmic latency required by CEC2; (3) we introduce a novel speaker-conditioning branch for TF-GridNet to achieve target speaker extraction; (4) we propose a fine-tuning step, where we compute an additional loss with respect to the target speaker signal compensated with the listener audiogram.
Multi-resolution location-based training for multi-channel continuous speech separation
The performance of automatic speech recognition (ASR) systems severely degrades when multi-talker speech overlap occurs.
Deep neural network techniques for monaural speech enhancement: state of the art analysis
We also review the use of speech-enhancement pre-trained models to boost speech enhancement process.
A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings
Speaker-attributed automatic speech recognition (SA-ASR) in multi-party meeting scenarios is one of the most valuable and challenging ASR task.
Quantitative Evidence on Overlooked Aspects of Enrollment Speaker Embeddings for Target Speaker Separation
Single channel target speaker separation (TSS) aims at extracting a speaker's voice from a mixture of multiple talkers given an enrollment utterance of that speaker.
Individualized Conditioning and Negative Distances for Speaker Separation
Speaker separation aims to extract multiple voices from a mixed signal.
Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches
However, its performance is often inferior to that of a blind source separation (BSS) counterpart with a similar network architecture, due to the auxiliary speaker encoder may sometimes generate ambiguous speaker embeddings.
A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings
Therefore, we propose the second approach, WD-SOT, to address alignment errors by introducing a word-level diarization model, which can get rid of such timestamp alignment dependency.