Real-time Single-channel Dereverberation and Separation with Time-domainAudio Separation Network
We investigate the recently proposed Time-domain Audio Sep-aration Network (TasNet) in the task of real-time single-channel speech dereverberation. Unlike systems that take time-frequency representation of the audio as input, TasNet learns anadaptive front-end in replacement of the time-frequency rep-resentation by a time-domain convolutional non-negative au-toencoder. We show that by formulating the dereverberationproblem as a denoising problem where the direct path is sepa-rated from the reverberations, a TasNet denoising autoencodercan outperform a deep LSTM baseline on log-power magnitudespectrogram input in both causal and non-causal settings. Wefurther show that adjusting the stride size in the convolutionalautoencoder helps both the dereverberation and separation per-formance.
PDFCode
Datasets
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Speech Separation | WSJ0-2mix | TasNet v2 | SI-SDRi | 13.2 | # 28 |