Speaker Separation

11 papers with code • 0 benchmarks • 3 datasets

This task has no description! Would you like to contribute one?

Latest papers with no code

Learning-based Robust Speaker Counting and Separation with the Aid of Spatial Coherence

no code yet • 13 Mar 2023

The global activity functions of each speaker are estimated from a simplex constructed using the eigenvectors of the SCM, while the local coherence functions are computed from the coherence between the wRTFs of a time-frequency bin and the global activity function-weighted RTF of the target speaker.

Online Binaural Speech Separation of Moving Speakers With a Wavesplit Network

no code yet • 13 Mar 2023

Binaural speech separation in real-world scenarios often involves moving speakers.

Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge

no code yet • 15 Feb 2023

To address the challenges encountered in the CEC2 setting, we introduce four major novelties: (1) we extend the state-of-the-art TF-GridNet model, originally designed for monaural speaker separation, for multi-channel, causal speech enhancement, and large improvements are observed by replacing the TCNDenseNet used in iNeuBe with this new architecture; (2) we leverage a recent dual window size approach with future-frame prediction to ensure that iNueBe-X satisfies the 5 ms constraint on algorithmic latency required by CEC2; (3) we introduce a novel speaker-conditioning branch for TF-GridNet to achieve target speaker extraction; (4) we propose a fine-tuning step, where we compute an additional loss with respect to the target speaker signal compensated with the listener audiogram.

Multi-resolution location-based training for multi-channel continuous speech separation

no code yet • 16 Jan 2023

The performance of automatic speech recognition (ASR) systems severely degrades when multi-talker speech overlap occurs.

Deep neural network techniques for monaural speech enhancement: state of the art analysis

no code yet • 1 Dec 2022

We also review the use of speech-enhancement pre-trained models to boost speech enhancement process.

A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings

no code yet • 1 Nov 2022

Speaker-attributed automatic speech recognition (SA-ASR) in multi-party meeting scenarios is one of the most valuable and challenging ASR task.

Quantitative Evidence on Overlooked Aspects of Enrollment Speaker Embeddings for Target Speaker Separation

no code yet • 23 Oct 2022

Single channel target speaker separation (TSS) aims at extracting a speaker's voice from a mixture of multiple talkers given an enrollment utterance of that speaker.

Individualized Conditioning and Negative Distances for Speaker Separation

no code yet • 12 Oct 2022

Speaker separation aims to extract multiple voices from a mixed signal.

Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

no code yet • 4 Apr 2022

However, its performance is often inferior to that of a blind source separation (BSS) counterpart with a similar network architecture, due to the auxiliary speaker encoder may sometimes generate ambiguous speaker embeddings.

A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings

no code yet • 31 Mar 2022

Therefore, we propose the second approach, WD-SOT, to address alignment errors by introducing a word-level diarization model, which can get rid of such timestamp alignment dependency.