Audio Generation

64 papers with code • 3 benchmarks • 8 datasets

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )

Latest papers with no code

Music Style Transfer With Diffusion Model

no code yet • 23 Apr 2024

Previous studies on music style transfer have mainly focused on one-to-one style conversion, which is relatively limited.

LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights

no code yet • 18 Apr 2024

Latent Diffusion Models (LDMs) have emerged as powerful generative models, known for delivering remarkable results under constrained computational resources.

Text-to-Audio Generation Synchronized with Videos

no code yet • 8 Mar 2024

Extensive evaluations on the AudioCaps and T2AV-Bench demonstrate that our T2AV sets a new standard for video-aligned TTA generation in ensuring visual alignment and temporal consistency.

(Un)paired signal-to-signal translation with 1D conditional GANs

no code yet • 5 Mar 2024

I show that a one-dimensional (1D) conditional generative adversarial network (cGAN) with an adversarial training architecture is capable of unpaired signal-to-signal ("sig2sig") translation.

Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models

no code yet • 2 Mar 2024

This paper introduces Bespoke Non-Stationary (BNS) Solvers, a solver distillation approach to improve sample efficiency of Diffusion and Flow models.

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

no code yet • 27 Feb 2024

Thus, instead of training the giant models from scratch, we propose to bridge the existing strong models with a shared latent representation space.

LLMBind: A Unified Modality-Task Integration Framework

no code yet • 22 Feb 2024

In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress.

Classification Diffusion Models

no code yet • 15 Feb 2024

These approaches achieve state-of-the-art results in image, video, and audio generation.

Analyzing Neural Network-Based Generative Diffusion Models through Convex Optimization

no code yet • 3 Feb 2024

Diffusion models are becoming widely used in state-of-the-art image, video and audio generation.

Bass Accompaniment Generation via Latent Diffusion

no code yet • 2 Feb 2024

At the core of our method are audio autoencoders that efficiently compress audio waveform samples into invertible latent representations, and a conditional latent diffusion model that takes as input the latent encoding of a mix and generates the latent encoding of a corresponding stem.