Audio Generation
64 papers with code • 3 benchmarks • 8 datasets
Audio generation (synthesis) is the task of generating raw audio such as speech.
( Image credit: MelNet )
Latest papers with no code
Music Style Transfer With Diffusion Model
Previous studies on music style transfer have mainly focused on one-to-one style conversion, which is relatively limited.
LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights
Latent Diffusion Models (LDMs) have emerged as powerful generative models, known for delivering remarkable results under constrained computational resources.
Text-to-Audio Generation Synchronized with Videos
Extensive evaluations on the AudioCaps and T2AV-Bench demonstrate that our T2AV sets a new standard for video-aligned TTA generation in ensuring visual alignment and temporal consistency.
(Un)paired signal-to-signal translation with 1D conditional GANs
I show that a one-dimensional (1D) conditional generative adversarial network (cGAN) with an adversarial training architecture is capable of unpaired signal-to-signal ("sig2sig") translation.
Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models
This paper introduces Bespoke Non-Stationary (BNS) Solvers, a solver distillation approach to improve sample efficiency of Diffusion and Flow models.
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
Thus, instead of training the giant models from scratch, we propose to bridge the existing strong models with a shared latent representation space.
LLMBind: A Unified Modality-Task Integration Framework
In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress.
Classification Diffusion Models
These approaches achieve state-of-the-art results in image, video, and audio generation.
Analyzing Neural Network-Based Generative Diffusion Models through Convex Optimization
Diffusion models are becoming widely used in state-of-the-art image, video and audio generation.
Bass Accompaniment Generation via Latent Diffusion
At the core of our method are audio autoencoders that efficiently compress audio waveform samples into invertible latent representations, and a conditional latent diffusion model that takes as input the latent encoding of a mix and generates the latent encoding of a corresponding stem.