Prior studies have analyzed the benefits of the resulting scale invariance of the weights for the gradient descent (GD) optimizers: it leads to a stabilized training due to the auto-tuning of step sizes.
Interpretability of deep neural networks is a recently emerging area of machine learning research targeting a better understanding of how models perform feature selection and derive their classification decisions.
Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio.
We introduce a probabilistic approach to unify deep continual learning with open set recognition, based on variational Bayesian inference.
In this work, we first describe a CNN based approach for weakly supervised training of audio events.
Based on this, we introduce a method for descriptor-based synthesis and show that we can control the descriptors of an instrument while keeping its timbre structure.
Traditional convolutional layers extract features from patches of data by applying a non-linearity on an affine function of the input.
Long-range dependencies modeling, widely used in capturing spatiotemporal correlation, has shown to be effective in CNN dominated computer vision tasks.
Ranked #18 on Object Detection on PASCAL VOC 2007
The proposed method uses an auxiliary classifier, trained on data that is known to be in-distribution, for detection and relabelling.
In this paper, we investigate how to learn rich and robust feature representations for audio classification from visual data and acoustic images, a novel audio data modality.