Audio Generation

64 papers with code • 3 benchmarks • 8 datasets

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )

Benchmarks

Add a Result

These leaderboards are used to track progress in Audio Generation

Dataset	Best Model	Compare
AudioCaps	Audiobox	See all
Classical music, 5 seconds at 12 kHz	Sparse Transformer 152M (strided)	See all
Symphony music	SymphonyNet	See all

Datasets

Subtasks

Latest papers with no code

Most implemented Social Latest No code

Music Style Transfer With Diffusion Model

no code yet • 23 Apr 2024

Previous studies on music style transfer have mainly focused on one-to-one style conversion, which is relatively limited.

Paper
Add Code

LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights

no code yet • 18 Apr 2024

Latent Diffusion Models (LDMs) have emerged as powerful generative models, known for delivering remarkable results under constrained computational resources.

Paper
Add Code

Text-to-Audio Generation Synchronized with Videos

no code yet • 8 Mar 2024

Extensive evaluations on the AudioCaps and T2AV-Bench demonstrate that our T2AV sets a new standard for video-aligned TTA generation in ensuring visual alignment and temporal consistency.

Paper
Add Code

(Un)paired signal-to-signal translation with 1D conditional GANs

no code yet • 5 Mar 2024

I show that a one-dimensional (1D) conditional generative adversarial network (cGAN) with an adversarial training architecture is capable of unpaired signal-to-signal ("sig2sig") translation.

Paper
Add Code

Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models

no code yet • 2 Mar 2024

This paper introduces Bespoke Non-Stationary (BNS) Solvers, a solver distillation approach to improve sample efficiency of Diffusion and Flow models.

Paper
Add Code

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

no code yet • 27 Feb 2024

Thus, instead of training the giant models from scratch, we propose to bridge the existing strong models with a shared latent representation space.

Paper
Add Code

LLMBind: A Unified Modality-Task Integration Framework

no code yet • 22 Feb 2024

In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress.

Paper
Add Code

Classification Diffusion Models

no code yet • 15 Feb 2024

These approaches achieve state-of-the-art results in image, video, and audio generation.

Paper
Add Code

Analyzing Neural Network-Based Generative Diffusion Models through Convex Optimization

no code yet • 3 Feb 2024

Diffusion models are becoming widely used in state-of-the-art image, video and audio generation.

Paper
Add Code

Bass Accompaniment Generation via Latent Diffusion

no code yet • 2 Feb 2024

At the core of our method are audio autoencoders that efficiently compress audio waveform samples into invertible latent representations, and a conditional latent diffusion model that takes as input the latent encoding of a mix and generates the latent encoding of a corresponding stem.

Paper
Add Code

Audio Generation

Benchmarks Add a Result

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result