Speech Enhancement
218 papers with code • 12 benchmarks • 19 datasets
Speech Enhancement is a signal processing task that involves improving the quality of speech signals captured under noisy or degraded conditions. The goal of speech enhancement is to make speech signals clearer, more intelligible, and more pleasant to listen to, which can be used for various applications such as voice recognition, teleconferencing, and hearing aids.
( Image credit: A Fully Convolutional Neural Network For Speech Enhancement )
Libraries
Use these libraries to find Speech Enhancement models and implementationsDatasets
Subtasks
Latest papers
ICASSP 2023 Acoustic Echo Cancellation Challenge
This is the fourth AEC challenge and it is enhanced by adding a second track for personalized acoustic echo cancellation, reducing the algorithmic + buffering latency to 20ms, as well as including a full-band version of AECMOS.
Unsupervised speech enhancement with diffusion-based generative models
To address this issue, we introduce an alternative approach that operates in an unsupervised manner, leveraging the generative power of diffusion models.
Single and Few-step Diffusion for Generative Speech Enhancement
While the performance of usual generative diffusion algorithms drops dramatically when lowering the number of function evaluations (NFEs) to obtain single-step diffusion, we show that our proposed method keeps a steady performance and therefore largely outperforms the diffusion baseline in this setting and also generalizes better than its predictive counterpart.
Multi-dimensional Speech Quality Assessment in Crowdsourcing
The commonly used standard ITU-T Rec.
Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models
Diff-SV unifies a DPM-based speech enhancement system with a speaker embedding extractor, and yields a discriminative and noise-tolerable speaker representation through a hierarchical structure.
Gray Jedi MVDR Post-filtering
Spatial filters can exploit deep-learning-based speech enhancement models to increase their reliability in scenarios with multiple speech sources scenarios.
Simulating room transfer functions between transducers mounted on audio devices using a modified image source method
The image source method (ISM) is often used to simulate room acoustics due to its ease of use and computational efficiency.
Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement
Compared to existing phase-aware speech enhancement methods, it further mitigates the compensation effect between the magnitude and phase by explicit phase estimation, elevating the perceptual quality of enhanced speech.
Separate Anything You Describe
In this work, we introduce AudioSep, a foundation model for open-domain audio source separation with natural language queries.
The Effect of Spoken Language on Speech Enhancement using Self-Supervised Speech Representation Loss Functions
In this work, SE models are trained and tested on a number of different languages, with self-supervised representations which themselves are trained using different language combinations and with differing network structures as loss function representations.