Speech Separation

97 papers with code • 18 benchmarks • 16 datasets

The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study.

Source: A Unified Framework for Speech Separation

Image credit: Speech Separation of A Target Speaker Based on Deep Neural Networks

Benchmarks

Add a Result

These leaderboards are used to track progress in Speech Separation

Dataset	Best Model	Compare
WSJ0-2mix	MossFormer2	See all
WHAMR!	MossFormer2	See all
WSJ0-3mix	SepTDA	See all
Libri2Mix	MossFormer2 (w speed perturb)	See all
LRS2	TDFNet-small	See all
WSJ0-5mix	SepTDA	See all
WSJ0-4mix	SepTDA	See all
Libri5Mix	Separate And Diffuse	See all
WHAM!	MossFormer2	See all
LRS3	RTFS-Net-4	See all
VoxCeleb2	RTFS-Net-4	See all
Libri10Mix	Separate And Diffuse	See all
Libri20Mix	Separate And Diffuse	See all
LibriCSS	Conformer (large)	See all
GRID corpus (mixed-speech)	Audio-Visual concat-ref	See all
TCD-TIMIT corpus (mixed-speech)	Audio-Visual concat-ref	See all
Libri15Mix	Hungarian PIT	See all
iKala	U-Net	See all

Show all 18 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Speech Separation models and implementations

mpariente/asteroid

10 papers

2,117

yluo42/TAC

3 papers

235

speechbrain/speechbrain

2 papers

7,911

espnet/espnet

2 papers

7,907

See all 6 libraries.

Datasets

Subtasks

Speech Extraction

Latest papers

Most implemented Social Latest No code

Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation

yuchen005/unified-enhance-separation • • 22 Feb 2023

To alleviate this problem, we propose a novel network to unify speech enhancement and separation with gradient modulation to improve noise-robustness.

22 Feb 2023

Paper
Code

An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits

jusperlee/lrs3-for-speech-separation • 21 Dec 2022

Then, inspired by the large number of connections between cortical regions and the thalamus, the model fuses the auditory and visual information in a thalamic subnetwork through top-down connections.

21 Dec 2022

Paper
Code

Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation

jwr1995/dtcn • • 27 Oct 2022

In this work deformable convolution is proposed as a solution to allow TCN models to have dynamic RFs that can adapt to various reverberation times for reverberant speech separation.

27 Oct 2022

Paper
Code

CasNet: Investigating Channel Robustness for Speech Separation

sinica-slam/casnet • 27 Oct 2022

In this study, inheriting the use of our previously constructed TAT-2mix corpus, we address the channel mismatch problem by proposing a channel-aware audio separation network (CasNet), a deep learning framework for end-to-end time-domain speech separation.

27 Oct 2022

Paper
Code

OCD: Learning to Overfit with Conditional Diffusion Models

shaharlutatipersonal/ocd • • 2 Oct 2022

We present a dynamic model in which the weights are conditioned on an input sample x and are learned to match those that would be obtained by finetuning a base model on x and its label y.

02 Oct 2022

Paper
Code

An efficient encoder-decoder architecture with top-down attention for speech separation

JusperLee/TDANet • • 30 Sep 2022

In addition, a large-size version of TDANet obtained SOTA results on three datasets, with MACs still only 10\% of Sepformer and the CPU inference time only 24\% of Sepformer.

205

30 Sep 2022

Paper
Code

CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

ruizhecao96/cmgan • • 22 Sep 2022

Convolution-augmented transformers (Conformers) are recently proposed in various speech-domain applications, such as automatic speech recognition (ASR) and speech separation, as they can capture both local and global dependencies.

256

22 Sep 2022

Paper
Code

Analysis of impact of emotions on target speech extraction and speech separation

butspeechfit/ravdess2mix • • 15 Aug 2022

One of the factors causing such degradation may be intrinsic speaker variability, such as emotions, occurring commonly in realistic speech.

15 Aug 2022

Paper
Code

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding

espnet/espnet • • 19 Jul 2022

To showcase such integration, we performed experiments on carefully designed synthetic datasets for noisy-reverberant multi-channel ST and SLU tasks, which can be used as benchmark corpora for future research.

7,907

19 Jul 2022

Paper
Code

Resource-Efficient Separation Transformer

speechbrain/speechbrain • • 19 Jun 2022

Transformers have recently achieved state-of-the-art performance in speech separation.

7,911

19 Jun 2022

Paper
Code

Speech Separation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result