Audio Source Separation is the process of separating a mixture (e.g. a pop band recording) into isolated sounds from individual sources (e.g. just the lead vocals).
Source: Model selection for deep audio source separation via clustering analysis
In this paper, we present an efficient neural network for end-to-end general purpose audio source separation.
Ranked #8 on
Speech Separation
on wsj0-2mix
Models for audio source separation usually operate on the magnitude spectrum, which ignores phase information and makes separation performance dependant on hyper-parameters for the spectral front-end.
Ranked #16 on
Music Source Separation
on MUSDB18
This paper deals with the problem of audio source separation.
The thud of a bouncing ball, the onset of speech as lips open -- when visual and audio events occur together, it suggests that there might be a common, underlying event that produced both signals.
We study the use of the Wave-U-Net architecture for speech enhancement, a model introduced by Stoller et al for the separation of music vocals and accompaniment.
Ranked #9 on
Speech Enhancement
on DEMAND
AUDIO SOURCE SEPARATION SPEECH ENHANCEMENT SPEECH RECOGNITION
Given a multi-microphone recording of an unknown number of speakers talking concurrently, we simultaneously localize the sources and separate the individual speakers.
Based on this idea, we drive the separator towards outputs deemed as realistic by discriminator networks that are trained to tell apart real from separator samples.
AUDIO SOURCE SEPARATION DATA AUGMENTATION MUSIC SOURCE SEPARATION
Learning how objects sound from video is challenging, since they often heavily overlap in a single audio channel.
Ranked #1 on
Audio Source Separation
on MUSIC (multi-source)
Our work is the first to learn audio source separation from large-scale "in the wild" videos containing multiple audio sources per video.
AUDIO DENOISING AUDIO SOURCE SEPARATION DENOISING MULTI-LABEL LEARNING
Clipping the gradient is a known approach to improving gradient descent, but requires hand selection of a clipping threshold hyperparameter.