Speech Separation

97 papers with code • 18 benchmarks • 16 datasets

The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study.

Source: A Unified Framework for Speech Separation

Image credit: Speech Separation of A Target Speaker Based on Deep Neural Networks

Libraries

Use these libraries to find Speech Separation models and implementations
10 papers
2,117
3 papers
235
2 papers
7,903
See all 6 libraries.

Latest papers with no code

Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model

no code yet • 30 Oct 2023

For an effective fusion of the two modalities for diffusion, we also propose a cross-attention-based feature fusion mechanism.

Real-time Speech Enhancement and Separation with a Unified Deep Neural Network for Single/Dual Talker Scenarios

no code yet • 16 Oct 2023

Unlike existing solutions that focus on modifying the loss function to accommodate zero-energy target signals, the proposed approach circumvents this problem by training the model to extract speech on both its output channels regardless if the input is a single or dual-talker mixture.

A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction

no code yet • 12 Oct 2023

We propose a multi-task universal speech enhancement (MUSE) model that can perform five speech enhancement (SE) tasks: dereverberation, denoising, speech separation (SS), target speaker extraction (TSE), and speaker counting.

GASS: Generalizing Audio Source Separation with Large-scale Data

no code yet • 29 Sep 2023

Here, we study a single general audio source separation (GASS) model trained to separate speech, music, and sound events in a supervised fashion with a large-scale dataset.

Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization

no code yet • 28 Sep 2023

We propose a modular pipeline for the single-channel separation, recognition, and diarization of meeting-style recordings and evaluate it on the Libri-CSS dataset.

Mixture Encoder Supporting Continuous Speech Separation for Meeting Recognition

no code yet • 15 Sep 2023

This mixture encoder leverages the original overlapped speech to mitigate the effect of artifacts introduced by the speech separation.

TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition

no code yet • 21 Aug 2023

The model operates on transcripts and audio token sequences and achieves multiple tasks through masking of inputs.

IIANet: An Intra- and Inter-Modality Attention Network for Audio-Visual Speech Separation

no code yet • 16 Aug 2023

Recent research has made significant progress in designing fusion modules for audio-visual speech separation.

Improving Deep Attractor Network by BGRU and GMM for Speech Separation

no code yet • 7 Aug 2023

Deep Attractor Network (DANet) is the state-of-the-art technique in speech separation field, which uses Bidirectional Long Short-Term Memory (BLSTM), but the complexity of the DANet model is very high.

Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model

no code yet • 29 Jul 2023

Cocktail party problem is the scenario where it is difficult to separate or distinguish individual speaker from a mixed speech from several speakers.