SepFormer

Introduced by Subakan et al. in Attention is All You Need in Speech Separation

SepFormer is Transformer-based neural network for speech separation. The SepFormer learns short and long-term dependencies with a multi-scale approach that employs transformers. It is mainly composed of multi-head attention and feed-forward layers. A dual-path framework (introduced by DPRNN) is adopted and RNNs are replaced with a multiscale pipeline composed of transformers that learn both short and long-term dependencies. The dual-path framework enables the mitigation of the quadratic complexity of transformers, as transformers in the dual-path framework process smaller chunks.

The model is based on the learned-domain masking approach and employs an encoder, a decoder, and a masking network, as shown in the figure. The encoder is fully convolutional, while the decoder employs two Transformers embedded inside the dual-path processing block. The decoder finally reconstructs the separated signals in the time domain by using the masks predicted by the masking network.

Source: Attention is All You Need in Speech Separation

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Speech Separation	7	43.75%
Speech Enhancement	2	12.50%
Speech Extraction	1	6.25%
Audio Source Separation	1	6.25%
Generalization Bounds	1	6.25%
Multi-Speaker Source Separation	1	6.25%
Speaker Verification	1	6.25%
Target Speaker Extraction	1	6.25%
Denoising	1	6.25%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Layer Normalization	Normalization
Linear Layer	Feedforward Networks
Multi-Head Attention	Attention Modules
Position-Wise Feed-Forward Layer	Feedforward Networks
PReLU	Activation Functions
ReLU	Activation Functions
Residual Connection	Skip Connections
Scaled Dot-Product Attention	Attention Mechanisms

Categories

Add Remove

Speech Separation Models