Speech Separation

97 papers with code • 18 benchmarks • 16 datasets

The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study.

Source: A Unified Framework for Speech Separation

Image credit: Speech Separation of A Target Speaker Based on Deep Neural Networks

Libraries

Use these libraries to find Speech Separation models and implementations
10 papers
2,115
3 papers
235
2 papers
7,892
See all 6 libraries.

Most implemented papers

A cappella: Audio-visual Singing Voice Separation

JuanFMontesinos/Acappella-YNet 20 Apr 2021

The task of isolating a target singing voice in music videos has useful applications.

MESH2IR: Neural Acoustic Impulse Response Generator for Complex 3D Scenes

anton-jeran/MESH2IR 18 May 2022

We show that the acoustic metrics of the IRs predicted from our MESH2IR match the ground truth with less than 10% error.

CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

ruizhecao96/cmgan 22 Sep 2022

Convolution-augmented transformers (Conformers) are recently proposed in various speech-domain applications, such as automatic speech recognition (ASR) and speech separation, as they can capture both local and global dependencies.

Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation

jwr1995/dtcn 27 Oct 2022

In this work deformable convolution is proposed as a solution to allow TCN models to have dynamic RFs that can adapt to various reverberation times for reverberant speech separation.

An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits

jusperlee/lrs3-for-speech-separation 21 Dec 2022

Then, inspired by the large number of connections between cortical regions and the thalamus, the model fuses the auditory and visual information in a thalamic subnetwork through top-down connections.

Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network

ishandutta2007/Speech-Denoising-Landscape 17 Apr 2015

Identification and extraction of singing voice from within musical mixtures is a key challenge in source separation and machine audition.

Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

JusperLee/UtterancePIT-Speech-Separation 1 Jul 2016

We propose a novel deep learning model, which supports permutation invariant training (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem.

Deep attractor network for single-microphone speaker separation

KMASAHIRO/DANet 27 Nov 2016

We propose a novel deep learning framework for single channel speech separation by creating attractor points in high dimensional embedding space of the acoustic signals which pull together the time-frequency bins corresponding to each source.

Deep Recurrent NMF for Speech Separation by Unfolding Iterative Thresholding

stwisdom/dr-nmf 21 Sep 2017

This interpretability also provides principled initializations that enable faster training and convergence to better solutions compared to conventional random initialization.