Audio Generation

60 papers with code • 3 benchmarks • 8 datasets

Audio generation (synthesis) is the task of generating raw audio such as speech.

( Image credit: MelNet )

Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models

jishengpeng/languagecodec 19 Feb 2024

Furthermore, we also validate the efficiency of the Language-Codec on downstream speech language models.

133
19 Feb 2024

Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation and Editing via Content-based Controls

kikyo-16/airgen 14 Feb 2024

We apply this method to fine-tune MusicGen, a leading autoregressive music generation model.

15
14 Feb 2024

Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation

happylittlecat2333/Auffusion 2 Jan 2024

Drawing inspiration from state-of-the-art Text-to-Image (T2I) diffusion models, we introduce Auffusion, a TTA system adapting T2I model frameworks to TTA task, by effectively leveraging their inherent generative strengths and precise cross-modal alignment.

108
02 Jan 2024

Speech collage: code-switched audio generation by collaging monolingual corpora

jsalt2022codeswitchingasr/generating-code-switched-audio 27 Sep 2023

Designing effective automatic speech recognition (ASR) systems for Code-Switching (CS) often depends on the availability of the transcribed CS resources.

7
27 Sep 2023

Invisible Watermarking for Audio Generation Diffusion Models

xirongc/watermark-audio-diffusion 22 Sep 2023

Diffusion models have gained prominence in the image domain for their capabilities in data generation and transformation, achieving state-of-the-art performance in various tasks in both image and audio domains.

9
22 Sep 2023

Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

Bai-YT/ConsistencyTTA 19 Sep 2023

Diffusion models power a vast majority of text-to-audio (TTA) generation methods.

9
19 Sep 2023

An Initial Exploration: Learning to Generate Realistic Audio for Silent Video

jaxwagner/sound_from_video 23 Aug 2023

Generating realistic audio effects for movies and other media is a challenging task that is accomplished today primarily through physical techniques known as Foley art.

2
23 Aug 2023

V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models

heng-hw/V2A-Mapper 18 Aug 2023

In this paper, we propose a lightweight solution to this problem by leveraging foundation models, specifically CLIP, CLAP, and AudioLDM.

7
18 Aug 2023

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

haoheliu/AudioLDM2 10 Aug 2023

Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model.

1,993
10 Aug 2023

MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies

retrocirce/musicldm 3 Aug 2023

Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation.

106
03 Aug 2023