Browse SoTA > Speech > Speech Enhancement

Speech Enhancement

54 papers with code · Speech

Speech enhancement is the task of taking a noisy speech input and producing an enhanced speech output.

( Image credit: A Fully Convolutional Neural Network For Speech Enhancement )

Benchmarks

Greatest papers with code

Spleeter: A Fast And State-of-the Art Music Source Separation Tool With Pre-trained Models

ISMIR 2019 Late-Breaking/Demo 2019 deezer/spleeter

We present and release a new tool for music source separation with pre-trained models called Spleeter. Spleeter was designed with ease of use, separation performance and speed in mind.

Ranked #3 on Music Source Separation on MUSDB18 (using extra training data)

MUSIC SOURCE SEPARATION SPEECH ENHANCEMENT

Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription

22 Apr 2020espnet/espnet

To demonstrate this, we use the CHiME-6 Challenge data as an example of challenging environments and noisy conditions of everyday speech.

DATA AUGMENTATION END-TO-END SPEECH RECOGNITION SPEECH ENHANCEMENT SPEECH RECOGNITION

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation

20 Sep 2018facebookresearch/demucs

The majority of the previous methods have formulated the separation problem through the time-frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase and magnitude of the signal, the suboptimality of time-frequency representation for speech separation, and the long latency in calculating the spectrograms.

MUSIC SOURCE SEPARATION SPEAKER SEPARATION SPEECH ENHANCEMENT SPEECH SEPARATION

SEGAN: Speech Enhancement Generative Adversarial Network

28 Mar 2017santi-pdp/segan

In contrast to current techniques, we operate at the waveform level, training the model end-to-end, and incorporate 28 speakers and 40 different noise conditions into the same model, such that model parameters are shared across them.

SPEECH ENHANCEMENT

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

11 Oct 2018mindslab-ai/voicefilter

In this paper, we present a novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker.

SPEAKER RECOGNITION SPEAKER SEPARATION SPEECH ENHANCEMENT SPEECH RECOGNITION

Deep learning for minimum mean-square error approaches to speech enhancement

Speech communication 2019 anicolson/DeepXi

MMSE approaches utilising the proposed a priori SNR estimator are able to achieve higher enhanced speech quality and intelligibility scores than recent masking- and mapping-based deep learning approaches.

SPEECH ENHANCEMENT

MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement

13 May 2019anicolson/DeepXi

Adversarial loss in a conditional generative adversarial network (GAN) is not designed to directly optimize evaluation metrics of a target task, and thus, may not always guide the generator in a GAN to generate data with improved metric scores.

SPEECH ENHANCEMENT

Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020 microsoft/DNS-Challenge

This paper investigates several aspects of training a RNN (recurrent neural network) that impact the objective and subjective quality of enhanced speech for real-time single-channel speech enhancement.

SPEECH ENHANCEMENT

Whispered-to-voiced Alaryngeal Speech Conversion with Generative Adversarial Networks

31 Aug 2018santi-pdp/segan_pytorch

Most methods of voice restoration for patients suffering from aphonia either produce whispered or monotone speech.

SPEECH ENHANCEMENT

Language and Noise Transfer in Speech Enhancement Generative Adversarial Network

18 Dec 2017santi-pdp/segan_pytorch

In this work, we present the results of adapting a speech enhancement generative adversarial network by finetuning the generator with small amounts of data.

SPEECH ENHANCEMENT