Speech Separation
97 papers with code • 18 benchmarks • 16 datasets
The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study.
Source: A Unified Framework for Speech Separation
Image credit: Speech Separation of A Target Speaker Based on Deep Neural Networks
Libraries
Use these libraries to find Speech Separation models and implementationsMost implemented papers
A cappella: Audio-visual Singing Voice Separation
The task of isolating a target singing voice in music videos has useful applications.
MESH2IR: Neural Acoustic Impulse Response Generator for Complex 3D Scenes
We show that the acoustic metrics of the IRs predicted from our MESH2IR match the ground truth with less than 10% error.
CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement
Convolution-augmented transformers (Conformers) are recently proposed in various speech-domain applications, such as automatic speech recognition (ASR) and speech separation, as they can capture both local and global dependencies.
Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation
In this work deformable convolution is proposed as a solution to allow TCN models to have dynamic RFs that can adapt to various reverberation times for reverberant speech separation.
An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits
Then, inspired by the large number of connections between cortical regions and the thalamus, the model fuses the auditory and visual information in a thalamic subnetwork through top-down connections.
Deep learning for monaural speech separation
In this paper, we study deep learning for monaural speech separation.
Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network
Identification and extraction of singing voice from within musical mixtures is a key challenge in source separation and machine audition.
Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation
We propose a novel deep learning model, which supports permutation invariant training (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem.
Deep attractor network for single-microphone speaker separation
We propose a novel deep learning framework for single channel speech separation by creating attractor points in high dimensional embedding space of the acoustic signals which pull together the time-frequency bins corresponding to each source.
Deep Recurrent NMF for Speech Separation by Unfolding Iterative Thresholding
This interpretability also provides principled initializations that enable faster training and convergence to better solutions compared to conventional random initialization.