Speech Enhancement
218 papers with code • 12 benchmarks • 19 datasets
Speech Enhancement is a signal processing task that involves improving the quality of speech signals captured under noisy or degraded conditions. The goal of speech enhancement is to make speech signals clearer, more intelligible, and more pleasant to listen to, which can be used for various applications such as voice recognition, teleconferencing, and hearing aids.
( Image credit: A Fully Convolutional Neural Network For Speech Enhancement )
Libraries
Use these libraries to find Speech Enhancement models and implementationsDatasets
Subtasks
Latest papers with no code
Mel-FullSubNet: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
In this work, we propose Mel-FullSubNet, a single-channel Mel-spectrogram denoising and dereverberation network for improving both speech quality and automatic speech recognition (ASR) performance.
Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired by Dynamic Neural Network
In this study, we present a novel weighting prediction approach, which explicitly learns the task relationships from downstream training information to address the core challenge of universal speech enhancement.
SECP: A Speech Enhancement-Based Curation Pipeline For Scalable Acquisition Of Clean Speech
In this paper, we address this issue by outlining Speech Enhancement-based Curation Pipeline (SECP) which serves as a framework to onboard clean speech.
Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model
Recently, Denoising Diffusion Probabilistic Models (DDPMs) have attained leading performances across a diverse range of generative tasks.
Diffusion Models for Audio Restoration
Here, we aim to show that diffusion models can combine the best of both worlds and offer the opportunity to design audio restoration algorithms with a good degree of interpretability and a remarkable performance in terms of sound quality.
Overview of the L3DAS23 Challenge on Audio-Visual Extended Reality
The primary goal of the L3DAS23 Signal Processing Grand Challenge at ICASSP 2023 is to promote and support collaborative research on machine learning for 3D audio signal processing, with a specific emphasis on 3D speech enhancement and 3D Sound Event Localization and Detection in Extended Reality applications.
Unrestricted Global Phase Bias-Aware Single-channel Speech Enhancement with Conformer-based Metric GAN
With the rapid development of neural networks in recent years, the ability of various networks to enhance the magnitude spectrum of noisy speech in the single-channel speech enhancement domain has become exceptionally outstanding.
Objective and subjective evaluation of speech enhancement methods in the UDASE task of the 7th CHiME challenge
In this paper, we present the objective and subjective evaluations of the systems that were submitted to the CHiME-7 UDASE task, and we provide an analysis of the results.
An Analysis of the Variance of Diffusion-based Speech Enhancement
The speech enhancement performance varies depending on the choice of the stochastic differential equation that controls the evolution of the mean and the variance along the diffusion processes when adding environmental and Gaussian noise.
Real-time Stereo Speech Enhancement with Spatial-Cue Preservation based on Dual-Path Structure
The algorithm runs in real-time on 10-ms frames with a 40 ms of look-ahead.