Speech Separation

97 papers with code • 18 benchmarks • 16 datasets

The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study.

Source: A Unified Framework for Speech Separation

Image credit: Speech Separation of A Target Speaker Based on Deep Neural Networks

Benchmarks

Add a Result

These leaderboards are used to track progress in Speech Separation

Dataset	Best Model	Compare
WSJ0-2mix	MossFormer2	See all
WHAMR!	MossFormer2	See all
WSJ0-3mix	SepTDA	See all
Libri2Mix	MossFormer2 (w speed perturb)	See all
LRS2	TDFNet-small	See all
WSJ0-5mix	SepTDA	See all
WSJ0-4mix	SepTDA	See all
Libri5Mix	Separate And Diffuse	See all
WHAM!	MossFormer2	See all
LRS3	RTFS-Net-4	See all
VoxCeleb2	RTFS-Net-4	See all
Libri10Mix	Separate And Diffuse	See all
Libri20Mix	Separate And Diffuse	See all
LibriCSS	Conformer (large)	See all
GRID corpus (mixed-speech)	Audio-Visual concat-ref	See all
TCD-TIMIT corpus (mixed-speech)	Audio-Visual concat-ref	See all
Libri15Mix	Hungarian PIT	See all
iKala	U-Net	See all

Show all 18 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Speech Separation models and implementations

mpariente/asteroid

10 papers

2,117

yluo42/TAC

3 papers

235

speechbrain/speechbrain

2 papers

7,911

espnet/espnet

2 papers

7,903

See all 6 libraries.

Datasets

Subtasks

Speech Extraction

Latest papers with no code

Most implemented Social Latest No code

Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model

no code yet • 30 Oct 2023

For an effective fusion of the two modalities for diffusion, we also propose a cross-attention-based feature fusion mechanism.

Paper
Add Code

Real-time Speech Enhancement and Separation with a Unified Deep Neural Network for Single/Dual Talker Scenarios

no code yet • 16 Oct 2023

Unlike existing solutions that focus on modifying the loss function to accommodate zero-energy target signals, the proposed approach circumvents this problem by training the model to extract speech on both its output channels regardless if the input is a single or dual-talker mixture.

Paper
Add Code

A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction

no code yet • 12 Oct 2023

We propose a multi-task universal speech enhancement (MUSE) model that can perform five speech enhancement (SE) tasks: dereverberation, denoising, speech separation (SS), target speaker extraction (TSE), and speaker counting.

Paper
Add Code

GASS: Generalizing Audio Source Separation with Large-scale Data

no code yet • 29 Sep 2023

Here, we study a single general audio source separation (GASS) model trained to separate speech, music, and sound events in a supervised fashion with a large-scale dataset.

Paper
Add Code

Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization

no code yet • 28 Sep 2023

We propose a modular pipeline for the single-channel separation, recognition, and diarization of meeting-style recordings and evaluate it on the Libri-CSS dataset.

Paper
Add Code

Mixture Encoder Supporting Continuous Speech Separation for Meeting Recognition

no code yet • 15 Sep 2023

This mixture encoder leverages the original overlapped speech to mitigate the effect of artifacts introduced by the speech separation.

Paper
Add Code

TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition

no code yet • 21 Aug 2023

The model operates on transcripts and audio token sequences and achieves multiple tasks through masking of inputs.

Paper
Add Code

IIANet: An Intra- and Inter-Modality Attention Network for Audio-Visual Speech Separation

no code yet • 16 Aug 2023

Recent research has made significant progress in designing fusion modules for audio-visual speech separation.

Paper
Add Code

Improving Deep Attractor Network by BGRU and GMM for Speech Separation

no code yet • 7 Aug 2023

Deep Attractor Network (DANet) is the state-of-the-art technique in speech separation field, which uses Bidirectional Long Short-Term Memory (BLSTM), but the complexity of the DANet model is very high.

Paper
Add Code

Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model

no code yet • 29 Jul 2023

Cocktail party problem is the scenario where it is difficult to separate or distinguish individual speaker from a mixed speech from several speakers.

Paper
Add Code

Speech Separation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result