Speech Separation

96 papers with code • 18 benchmarks • 16 datasets

The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study.

Source: A Unified Framework for Speech Separation

Image credit: Speech Separation of A Target Speaker Based on Deep Neural Networks

Benchmarks

Add a Result

These leaderboards are used to track progress in Speech Separation

Dataset	Best Model	Compare
WSJ0-2mix	MossFormer2	See all
WHAMR!	MossFormer2	See all
WSJ0-3mix	SepTDA	See all
Libri2Mix	MossFormer2 (w speed perturb)	See all
LRS2	TDFNet-small	See all
WSJ0-5mix	SepTDA	See all
WSJ0-4mix	SepTDA	See all
Libri5Mix	Separate And Diffuse	See all
WHAM!	MossFormer2	See all
LRS3	RTFS-Net-4	See all
VoxCeleb2	RTFS-Net-4	See all
Libri10Mix	Separate And Diffuse	See all
Libri20Mix	Separate And Diffuse	See all
LibriCSS	Conformer (large)	See all
GRID corpus (mixed-speech)	Audio-Visual concat-ref	See all
TCD-TIMIT corpus (mixed-speech)	Audio-Visual concat-ref	See all
Libri15Mix	Hungarian PIT	See all
iKala	U-Net	See all

Show all 18 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Speech Separation models and implementations

mpariente/asteroid

10 papers

2,089

yluo42/TAC

3 papers

232

speechbrain/speechbrain

2 papers

7,838

espnet/espnet

2 papers

7,835

See all 6 libraries.

Datasets

Subtasks

Speech Extraction

Most implemented papers

Most implemented Social Latest No code

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation

naplab/Conv-TasNet • • 20 Sep 2018

The majority of the previous methods have formulated the separation problem through the time-frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase and magnitude of the signal, the suboptimality of time-frequency representation for speech separation, and the long latency in calculating the spectrograms.

Paper
Code

Deep clustering: Discriminative embeddings for segmentation and separation

mpariente/asteroid • • 18 Aug 2015

The framework can be used without class labels, and therefore has the potential to be trained on a diverse set of sound types, and to generalize to novel sources.

Paper
Code

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation

mpariente/asteroid • • 14 Oct 2019

Recent studies in deep learning-based speech separation have proven the superiority of time-domain approaches to conventional time-frequency-based methods.

Paper
Code

Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation

meokz/looking-to-listen • 10 Apr 2018

Solving this task using only audio as input is extremely challenging and does not provide an association of the separated speech signals with speakers in the video.

Paper
Code

Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation

ujscjj/DPTNet • • Interspeech 2020

By introduces a improved transformer, elements in speech sequences can interact directly, which enables DPTNet can model for the speech sequences with direct context-awareness.

Paper
Code

Voice Separation with an Unknown Number of Multiple Speakers

facebookresearch/svoice • • ICML 2020

We present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously.

Paper
Code

Sudo rm -rf: Efficient Networks for Universal Audio Source Separation

etzinis/sudo_rm_rf • • 14 Jul 2020

In this paper, we present an efficient neural network for end-to-end general purpose audio source separation.

Paper
Code

Attention is All You Need in Speech Separation

speechbrain/speechbrain • • 25 Oct 2020

Transformers are emerging as a natural alternative to standard RNNs, replacing recurrent computations with a multi-head attention mechanism.

Paper
Code

Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural Networks

snsun/pit-speech-separation • • 18 Mar 2017

We evaluated uPIT on the WSJ0 and Danish two- and three-talker mixed-speech separation tasks and found that uPIT outperforms techniques based on Non-negative Matrix Factorization (NMF) and Computational Auditory Scene Analysis (CASA), and compares favorably with Deep Clustering (DPCL) and the Deep Attractor Network (DANet).

Paper
Code

TasNet: time-domain audio separation network for real-time, single-channel speech separation

mpariente/asteroid • • 1 Nov 2017

We directly model the signal in the time-domain using an encoder-decoder framework and perform the source separation on nonnegative encoder outputs.

Paper
Code

Speech Separation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result