Exploring the Best Loss Function for DNN-Based Low-latency Speech Enhancement with Temporal Convolutional Networks

Recently, deep neural networks (DNNs) have been successfully used for speech enhancement, and DNN-based speech enhancement is becoming an attractive research area. While time-frequency masking based on the short-time Fourier transform (STFT) has been widely used for DNN-based speech enhancement over the last years, time domain methods such as the time-domain audio separation network (TasNet) have also been proposed... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK BENCHMARK
Speech Enhancement Deep Noise Suppression (DNS) Challenge Conv-TasNet-SNR PESQ 2.73 # 6
ΔPESQ 1.15 # 1
Speech Enhancement Deep Noise Suppression (DNS) Challenge Noisy/unprocessed PESQ 1.58 # 9
Speech Dereverberation Deep Noise Suppression (DNS) Challenge Noisy/unprocessed PESQ 1.82 # 2
Speech Dereverberation Deep Noise Suppression (DNS) Challenge Conv-TasNet-SNR PESQ 2.75 # 1
ΔPESQ 0.93 # 1
Speech Enhancement DEMAND STFT-TCN noncausal PESQ 2.89 # 10

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet