Speech Separation

97 papers with code • 18 benchmarks • 16 datasets

The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study.

Source: A Unified Framework for Speech Separation

Image credit: Speech Separation of A Target Speaker Based on Deep Neural Networks

Benchmarks

Add a Result

These leaderboards are used to track progress in Speech Separation

Dataset	Best Model	Compare
WSJ0-2mix	MossFormer2	See all
WHAMR!	MossFormer2	See all
WSJ0-3mix	SepTDA	See all
Libri2Mix	MossFormer2 (w speed perturb)	See all
LRS2	TDFNet-small	See all
WSJ0-5mix	SepTDA	See all
WSJ0-4mix	SepTDA	See all
Libri5Mix	Separate And Diffuse	See all
WHAM!	MossFormer2	See all
LRS3	RTFS-Net-4	See all
VoxCeleb2	RTFS-Net-4	See all
Libri10Mix	Separate And Diffuse	See all
Libri20Mix	Separate And Diffuse	See all
LibriCSS	Conformer (large)	See all
GRID corpus (mixed-speech)	Audio-Visual concat-ref	See all
TCD-TIMIT corpus (mixed-speech)	Audio-Visual concat-ref	See all
Libri15Mix	Hungarian PIT	See all
iKala	U-Net	See all

Show all 18 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Speech Separation models and implementations

mpariente/asteroid

10 papers

2,115

yluo42/TAC

3 papers

235

speechbrain/speechbrain

2 papers

7,900

espnet/espnet

2 papers

7,892

See all 6 libraries.

Datasets

Subtasks

Speech Extraction

Most implemented papers

Most implemented Social Latest No code

A cappella: Audio-visual Singing Voice Separation

JuanFMontesinos/Acappella-YNet • • 20 Apr 2021

The task of isolating a target singing voice in music videos has useful applications.

Paper
Code

MESH2IR: Neural Acoustic Impulse Response Generator for Complex 3D Scenes

anton-jeran/MESH2IR • • 18 May 2022

We show that the acoustic metrics of the IRs predicted from our MESH2IR match the ground truth with less than 10% error.

Paper
Code

CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

ruizhecao96/cmgan • • 22 Sep 2022

Convolution-augmented transformers (Conformers) are recently proposed in various speech-domain applications, such as automatic speech recognition (ASR) and speech separation, as they can capture both local and global dependencies.

Paper
Code

Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation

jwr1995/dtcn • • 27 Oct 2022

In this work deformable convolution is proposed as a solution to allow TCN models to have dynamic RFs that can adapt to various reverberation times for reverberant speech separation.

Paper
Code

An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits

jusperlee/lrs3-for-speech-separation • 21 Dec 2022

Then, inspired by the large number of connections between cortical regions and the thalamus, the model fuses the auditory and visual information in a thalamic subnetwork through top-down connections.

Paper
Code

Deep learning for monaural speech separation

posenhuang/deeplearningsourceseparation • ICASSP 2014

In this paper, we study deep learning for monaural speech separation.

Paper
Code

Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network

ishandutta2007/Speech-Denoising-Landscape • 17 Apr 2015

Identification and extraction of singing voice from within musical mixtures is a key challenge in source separation and machine audition.

Paper
Code

Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

JusperLee/UtterancePIT-Speech-Separation • • 1 Jul 2016

We propose a novel deep learning model, which supports permutation invariant training (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem.

Paper
Code

Deep attractor network for single-microphone speaker separation

KMASAHIRO/DANet • • 27 Nov 2016

We propose a novel deep learning framework for single channel speech separation by creating attractor points in high dimensional embedding space of the acoustic signals which pull together the time-frequency bins corresponding to each source.

Paper
Code

Deep Recurrent NMF for Speech Separation by Unfolding Iterative Thresholding

stwisdom/dr-nmf • 21 Sep 2017

This interpretability also provides principled initializations that enable faster training and convergence to better solutions compared to conventional random initialization.

Paper
Code

Speech Separation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result