Audio Generation
60 papers with code • 3 benchmarks • 8 datasets
Audio generation (synthesis) is the task of generating raw audio such as speech.
( Image credit: MelNet )
Most implemented papers
WaveNet: A Generative Model for Raw Audio
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.
Adversarial Audio Synthesis
Audio signals are sampled at high temporal resolutions, and learning to synthesize audio requires capturing structure across a range of timescales.
GANSynth: Adversarial Neural Audio Synthesis
Efficient audio synthesis is an inherently difficult machine learning task, as human perception is sensitive to both global structure and fine-scale waveform coherence.
It's Raw! Audio Generation with State-Space Models
SaShiMi yields state-of-the-art performance for unconditional waveform generation in the autoregressive setting.
MelNet: A Generative Model for Audio in the Frequency Domain
Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps.
AudioLM: a Language Modeling Approach to Audio Generation
We introduce AudioLM, a framework for high-quality audio generation with long-term consistency.
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time.
Audio Super Resolution using Neural Networks
We introduce a new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks.
Assisted Sound Sample Generation with Musical Conditioning in Adversarial Auto-Encoders
Its training data subsets can directly be visualized in the 3D latent representation.
Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion
End-to-end models for raw audio generation are a challenge, specially if they have to work with non-parallel data, which is a desirable setup in many situations.