SpecGAN

Introduced by Donahue et al. in Adversarial Audio Synthesis

SpecGAN is a generative adversarial network method for spectrogram-based, frequency-domain audio generation. The problem is suited for GANs designed for image generation. The model can be approximately inverted.

To process audio into suitable spectrograms, the authors perform the short-time Fourier transform with 16 ms windows and 8ms stride, resulting in 128 frequency bins, linearly spaced from 0 to 8 kHz. They take the magnitude of the resultant spectra and scale amplitude values logarithmically to better-align with human perception. They then normalize each frequency bin to have zero mean and unit variance. They clip the spectra to $3$ standard deviations and rescale to $\left[−1, 1\right]$.

They then use the DCGAN approach on the result spectra.

Source: Adversarial Audio Synthesis

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Audio Generation	1	50.00%
Image Generation	1	50.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
DCGAN	Generative Models
Griffin-Lim Algorithm	Phase Reconstruction
Phase Shuffle	Audio Artifact Removal
Tanh Activation	Activation Functions
WGAN-GP Loss	Loss Functions

Categories

Add Remove

Generative Audio Models