TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Separation	WSJ0-2mix	SepTDA (L=12)	SI-SDRi	24.0	# 2
Speech Separation	WSJ0-3mix	SepTDA	SI-SDRi	23.7	# 1
Speech Separation	WSJ0-4mix	SepTDA	SI-SDRi	22.0	# 1
Speech Separation	WSJ0-5mix	SepTDA	SI-SDRi	21.0	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/boosting-unknown-number-speaker-separation/speech-separation-on-wsj0-3mix)](https://paperswithcode.com/sota/speech-separation-on-wsj0-3mix?p=boosting-unknown-number-speaker-separation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/boosting-unknown-number-speaker-separation/speech-separation-on-wsj0-4mix)](https://paperswithcode.com/sota/speech-separation-on-wsj0-4mix?p=boosting-unknown-number-speaker-separation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/boosting-unknown-number-speaker-separation/speech-separation-on-wsj0-5mix)](https://paperswithcode.com/sota/speech-separation-on-wsj0-5mix?p=boosting-unknown-number-speaker-separation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/boosting-unknown-number-speaker-separation/speech-separation-on-wsj0-2mix)](https://paperswithcode.com/sota/speech-separation-on-wsj0-2mix?p=boosting-unknown-number-speaker-separation)`

Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor

23 Jan 2024 · Younglo Lee, Shukjae Choi, Byeong-Yeol Kim, Zhong-Qiu Wang, Shinji Watanabe ·

We propose a novel speech separation model designed to separate mixtures with an unknown number of speakers. The proposed model stacks 1) a dual-path processing block that can model spectro-temporal patterns, 2) a transformer decoder-based attractor (TDA) calculation module that can deal with an unknown number of speakers, and 3) triple-path processing blocks that can model inter-speaker relations. Given a fixed, small set of learned speaker queries and the mixture embedding produced by the dual-path blocks, TDA infers the relations of these queries and generates an attractor vector for each speaker. The estimated attractors are then combined with the mixture embedding by feature-wise linear modulation conditioning, creating a speaker dimension. The mixture embedding, conditioned with speaker information produced by TDA, is fed to the final triple-path blocks, which augment the dual-path blocks with an additional pathway dedicated to inter-speaker processing. The proposed approach outperforms the previous best reported in the literature, achieving 24.0 and 23.7 dB SI-SDR improvement (SI-SDRi) on WSJ0-2 and 3mix respectively, with a single model trained to separate 2- and 3-speaker mixtures. The proposed model also exhibits strong performance and generalizability at counting sources and separating mixtures with up to 5 speakers.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Speaker Separation

Speech Separation

Datasets

WSJ0-2mix

Results from the Paper

Add Remove

Ranked #1 on Speech Separation on WSJ0-5mix

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Separation	WSJ0-2mix	SepTDA (L=12)	SI-SDRi	24.0	# 2	Compare
Speech Separation	WSJ0-3mix	SepTDA	SI-SDRi	23.7	# 1	Compare
Speech Separation	WSJ0-4mix	SepTDA	SI-SDRi	22.0	# 1	Compare
Speech Separation	WSJ0-5mix	SepTDA	SI-SDRi	21.0	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove