Speech Enhancement
218 papers with code • 12 benchmarks • 19 datasets
Speech Enhancement is a signal processing task that involves improving the quality of speech signals captured under noisy or degraded conditions. The goal of speech enhancement is to make speech signals clearer, more intelligible, and more pleasant to listen to, which can be used for various applications such as voice recognition, teleconferencing, and hearing aids.
( Image credit: A Fully Convolutional Neural Network For Speech Enhancement )
Libraries
Use these libraries to find Speech Enhancement models and implementationsDatasets
Subtasks
Most implemented papers
Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices
A popular approach to multichannel source separation is to integrate a spatial model with a source model for estimating the spatial covariance matrices (SCMs) and power spectral densities (PSDs) of each sound source in the time-frequency domain.
RHR-Net: A Residual Hourglass Recurrent Neural Network for Speech Enhancement
Most current speech enhancement models use spectrogram features that require an expensive transformation and result in phase information loss.
Word-level Embeddings for Cross-Task Transfer Learning in Speech Processing
Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer.
Improving GANs for Speech Enhancement
The former constrains the generators to learn a common mapping that is iteratively applied at all enhancement stages and results in a small model footprint.
Tensor-to-Vector Regression for Multi-channel Speech Enhancement based on Tensor-Train Network
Finally, in 8-channel conditions, a PESQ of 3. 12 is achieved using 20 million parameters for TTN, whereas a DNN with 68 million parameters can only attain a PESQ of 3. 06.
Weighted Speech Distortion Losses for Neural-network-based Real-time Speech Enhancement
This paper investigates several aspects of training a RNN (recurrent neural network) that impact the objective and subjective quality of enhanced speech for real-time single-channel speech enhancement.
Deep Residual-Dense Lattice Network for Speech Enhancement
Motivated by this, we propose the residual-dense lattice network (RDL-Net), which is a new CNN for speech enhancement that employs both residual and dense aggregations without over-allocating parameters for feature re-usage.
Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression
This paper introduces a dual-signal transformation LSTM network (DTLN) for real-time speech enhancement as part of the Deep Noise Suppression Challenge (DNS-Challenge).
Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement
Finally, our experiments of multi-channel speech enhancement on a simulated noisy WSJ0 corpus demonstrate that our proposed hybrid CNN-TT architecture achieves better results than both DNN and CNN models in terms of better-enhanced speech qualities and smaller parameter sizes.
A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech
Over the past few years, speech enhancement methods based on deep learning have greatly surpassed traditional methods based on spectral subtraction and spectral estimation.