Speech Enhancement
214 papers with code • 12 benchmarks • 19 datasets
Speech Enhancement is a signal processing task that involves improving the quality of speech signals captured under noisy or degraded conditions. The goal of speech enhancement is to make speech signals clearer, more intelligible, and more pleasant to listen to, which can be used for various applications such as voice recognition, teleconferencing, and hearing aids.
( Image credit: A Fully Convolutional Neural Network For Speech Enhancement )
Libraries
Use these libraries to find Speech Enhancement models and implementationsDatasets
Subtasks
Most implemented papers
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
In this paper, we present a novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker.
Language and Noise Transfer in Speech Enhancement Generative Adversarial Network
In this work, we present the results of adapting a speech enhancement generative adversarial network by finetuning the generator with small amounts of data.
Whispered-to-voiced Alaryngeal Speech Conversion with Generative Adversarial Networks
Most methods of voice restoration for patients suffering from aphonia either produce whispered or monotone speech.
Improved Speech Enhancement with the Wave-U-Net
We study the use of the Wave-U-Net architecture for speech enhancement, a model introduced by Stoller et al for the separation of music vocals and accompaniment.
rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method
In the end, a posteriori SNR weighted energy difference is applied to the extended pitch segments of the denoised speech signal for detecting voice activity.
Spleeter: A Fast And State-of-the Art Music Source Separation Tool With Pre-trained Models
We present and release a new tool for music source separation with pre-trained models called Spleeter. Spleeter was designed with ease of use, separation performance and speed in mind.
Real Time Speech Enhancement in the Waveform Domain
The proposed model matches state-of-the-art performance of both causal and non causal methods while working directly on the raw waveform.
MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement
The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory.
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning.
Speech Denoising Convolutional Neural Network trained with Deep Feature Losses.
We present an end-to-end deep learning approach to denoising speech signals by processing the raw waveform directly.