Speech Separation

96 papers with code • 18 benchmarks • 16 datasets

The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study.

Source: A Unified Framework for Speech Separation

Image credit: Speech Separation of A Target Speaker Based on Deep Neural Networks

Benchmarks

Add a Result

These leaderboards are used to track progress in Speech Separation

Dataset	Best Model	Compare
WSJ0-2mix	MossFormer2	See all
WHAMR!	MossFormer2	See all
WSJ0-3mix	SepTDA	See all
Libri2Mix	MossFormer2 (w speed perturb)	See all
LRS2	TDFNet-small	See all
WSJ0-5mix	SepTDA	See all
WSJ0-4mix	SepTDA	See all
Libri5Mix	Separate And Diffuse	See all
WHAM!	MossFormer2	See all
LRS3	RTFS-Net-4	See all
VoxCeleb2	RTFS-Net-4	See all
Libri10Mix	Separate And Diffuse	See all
Libri20Mix	Separate And Diffuse	See all
LibriCSS	Conformer (large)	See all
GRID corpus (mixed-speech)	Audio-Visual concat-ref	See all
TCD-TIMIT corpus (mixed-speech)	Audio-Visual concat-ref	See all
Libri15Mix	Hungarian PIT	See all
iKala	U-Net	See all

Show all 18 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Speech Separation models and implementations

mpariente/asteroid

10 papers

2,105

yluo42/TAC

3 papers

234

speechbrain/speechbrain

2 papers

7,869

espnet/espnet

2 papers

7,867

See all 6 libraries.

Datasets

Subtasks

Speech Extraction

Latest papers with no code

Most implemented Social Latest No code

Robust Active Speaker Detection in Noisy Environments

no code yet • 27 Mar 2024

Experiments demonstrate that non-speech audio noises significantly impact ASD models, and our proposed approach improves ASD performance in noisy environments.

Paper
Add Code

PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings

no code yet • 4 Mar 2024

A major drawback of supervised speech separation (SSep) systems is their reliance on synthetic data, leading to poor real-world generalization.

Paper
Add Code

Probing Self-supervised Learning Models with Target Speech Extraction

no code yet • 17 Feb 2024

TSE uniquely requires both speaker identification and speech separation, distinguishing it from other tasks in the Speech processing Universal PERformance Benchmark (SUPERB) evaluation.

Paper
Add Code

Mixture to Mixture: Leveraging Close-talk Mixtures as Weak-supervision for Speech Separation

no code yet • 14 Feb 2024

We propose mixture to mixture (M2M) training, a weakly-supervised neural speech separation algorithm that leverages close-talk mixtures as a weak supervision for training discriminative models to separate far-field mixtures.

Paper
Add Code

Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor

no code yet • 23 Jan 2024

We propose a novel speech separation model designed to separate mixtures with an unknown number of speakers.

Paper
Add Code

Resource-constrained stereo singing voice cancellation

no code yet • 22 Jan 2024

We study the problem of stereo singing voice cancellation, a subtask of music source separation, whose goal is to estimate an instrumental background from a stereo mix.

Paper
Add Code

Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization

no code yet • 16 Jan 2024

The proposed method can take audio-visual input and leverage the speaker's acoustic footprint or lip track to flexibly conduct audio-based, video-based, and audio-visual speaker diarization in a unified sequence-to-sequence framework.

Paper
Add Code

Hyperbolic Distance-Based Speech Separation

no code yet • 7 Jan 2024

In this work, we explore the task of hierarchical distance-based speech separation defined on a hyperbolic manifold.

Paper
Add Code

Single-Microphone Speaker Separation and Voice Activity Detection in Noisy and Reverberant Environments

no code yet • 7 Jan 2024

Speech separation involves extracting an individual speaker's voice from a multi-speaker audio signal.

Paper
Add Code

Improving Label Assignments Learning by Dynamic Sample Dropout Combined with Layer-wise Optimization in Speech Separation

no code yet • 20 Nov 2023

Despite its success, previous studies showed that PIT is plagued by excessive label assignment switching in adjacent epochs, impeding the model to learn better label assignments.

Paper
Add Code

Speech Separation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result