Browse SoTA > Audio > Audio Generation

Audio Generation

16 papers with code · Audio

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )

Benchmarks

Greatest papers with code

GANSynth: Adversarial Neural Audio Synthesis

ICLR 2019 tensorflow/magenta

Efficient audio synthesis is an inherently difficult machine learning task, as human perception is sensitive to both global structure and fine-scale waveform coherence.

AUDIO GENERATION

WaveNet: A Generative Model for Raw Audio

12 Sep 2016maciejkula/spotlight

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.

AUDIO GENERATION SPEECH SYNTHESIS

DDSP: Differentiable Digital Signal Processing

ICLR 2020 magenta/ddsp

In this paper, we introduce the Differentiable Digital Signal Processing (DDSP) library, which enables direct integration of classic signal processing elements with deep learning methods.

AUDIO GENERATION

Generating Long Sequences with Sparse Transformers

Preprint 2019 openai/sparse_attention

Transformers are powerful sequence models, but require time and memory that grows quadratically with the sequence length.

AUDIO GENERATION IMAGE GENERATION LANGUAGE MODELLING

Adversarial Audio Synthesis

ICLR 2019 chrisdonahue/wavegan

Audio signals are sampled at high temporal resolutions, and learning to synthesize audio requires capturing structure across a range of timescales.

AUDIO GENERATION IMAGE GENERATION

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

22 Dec 2016soroushmehr/sampleRNN_ICLR2017

In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time.

AUDIO GENERATION

Audio Super Resolution using Neural Networks

2 Aug 2017kuleshov/audio-super-res

We introduce a new audio processing technique that increases the sampling rate of signals such as speech or music using deep convolutional neural networks.

AUDIO SUPER-RESOLUTION

Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion

NeurIPS 2019 liusongxiang/StarGAN-Voice-Conversion

End-to-end models for raw audio generation are a challenge, specially if they have to work with non-parallel data, which is a desirable setup in many situations.

AUDIO GENERATION VOICE CONVERSION

MelNet: A Generative Model for Audio in the Frequency Domain

ICLR 2020 fatchord/MelNet

Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps.

AUDIO GENERATION MUSIC GENERATION SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion

NeurIPS 2019 joansj/blow

End-to-end models for raw audio generation are a challenge, specially if they have to work with non-parallel data, which is a desirable setup in many situations.

AUDIO GENERATION VOICE CONVERSION