Audio Generation

64 papers with code • 3 benchmarks • 9 datasets

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )

Benchmarks

Add a Result

These leaderboards are used to track progress in Audio Generation

Dataset	Best Model	Compare
AudioCaps	Audiobox	See all
Classical music, 5 seconds at 12 kHz	Sparse Transformer 152M (strided)	See all
Symphony music	SymphonyNet	See all

Datasets

Subtasks

Latest papers with no code

Most implemented Social Latest No code

Bass Accompaniment Generation via Latent Diffusion

no code yet • 2 Feb 2024

At the core of our method are audio autoencoders that efficiently compress audio waveform samples into invertible latent representations, and a conditional latent diffusion model that takes as input the latent encoding of a mix and generates the latent encoding of a corresponding stem.

Paper
Add Code

EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks

no code yet • 31 Jan 2024

The advent of Large Models marks a new era in machine learning, significantly outperforming smaller models by leveraging vast datasets to capture and synthesize complex patterns.

Paper
Add Code

ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering

no code yet • 14 Jan 2024

The language model (LM) approach based on acoustic and linguistic prompts, such as VALL-E, has achieved remarkable progress in the field of zero-shot audio generation.

Paper
Add Code

Masked Audio Generation using a Single Non-Autoregressive Transformer

no code yet • 9 Jan 2024

We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens.

Paper
Add Code

Efficient Parallel Audio Generation using Group Masked Language Modeling

no code yet • 2 Jan 2024

We present a fast and high-quality codec language model for parallel audio generation.

Paper
Add Code

Audiobox: Unified Audio Generation with Natural Language Prompts

no code yet • 25 Dec 2023

Research communities have made great progress over the past year advancing the performance of large scale audio generative models for a single modality (speech, sound, or music) through adopting more powerful generative models and scaling data.

Paper
Add Code

Diffusion-EXR: Controllable Review Generation for Explainable Recommendation via Diffusion Models

no code yet • 24 Dec 2023

Denoising Diffusion Probabilistic Model (DDPM) has shown great competence in image and audio generation tasks.

Paper
Add Code

CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling

no code yet • 8 Dec 2023

We introduce a multi-modal diffusion model tailored for the bi-directional conditional generation of video and audio.

Paper
Add Code

SEFGAN: Harvesting the Power of Normalizing Flows and GANs for Efficient High-Quality Speech Enhancement

no code yet • 4 Dec 2023

This paper proposes SEFGAN, a Deep Neural Network (DNN) combining maximum likelihood training and Generative Adversarial Networks (GANs) for efficient speech enhancement (SE).

Paper
Add Code

tinyCLAP: Distilling Constrastive Language-Audio Pretrained Models

no code yet • 24 Nov 2023

Contrastive Language-Audio Pretraining (CLAP) became of crucial importance in the field of audio and speech processing.

Paper
Add Code

Audio Generation

Benchmarks Add a Result

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result