Music Generation
129 papers with code • 0 benchmarks • 24 datasets
Music Generation is the task of generating music or music-like sounds from a model or algorithm. The goal is to produce a sequence of notes or sound events that are similar to existing music in some way, such as having the same style, genre, or mood.
Benchmarks
These leaderboards are used to track progress in Music Generation
Libraries
Use these libraries to find Music Generation models and implementationsDatasets
Latest papers
Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey
Music has been frequently compared to language, as they share several similarities, including sequential representations of text and music.
Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion
We propose Stochastic Control Guidance (SCG), a novel guidance method that only requires forward evaluation of rule functions that can work with pre-trained diffusion models in a plug-and-play way, thus achieving training-free guidance for non-differentiable rules for the first time.
Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation and Editing via Content-based Controls
We apply this method to fine-tune MusicGen, a leading autoregressive music generation model.
PAM: Prompting Audio-Language Models for Audio Quality Assessment
Here, we exploit this capability and introduce PAM, a no-reference metric for assessing audio quality for different audio processing tasks.
Combinatorial music generation model with song structure graph analysis
In this work, we propose a symbolic music generation model with the song structure graph analysis network.
MusER: Musical Element-Based Regularization for Generating Symbolic Music with Emotion
However, prior research on deep learning-based emotional music generation has rarely explored the contribution of different musical elements to emotions, let alone the deliberate manipulation of these elements to alter the emotion of music, which is not conducive to fine-grained element-level control over emotions.
The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation
We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models.
Mustango: Toward Controllable Text-to-Music Generation
Through extensive experiments, we show that the quality of the music generated by Mustango is state-of-the-art, and the controllability through music-specific text prompts greatly outperforms other models such as MusicGen and AudioLDM2.
Exploring Variational Auto-Encoder Architectures, Configurations, and Datasets for Generative Music Explainable AI
This paper contributes a systematic examination of the impact that different combinations of Variational Auto-Encoder models (MeasureVAE and AdversarialVAE), configurations of latent space in the AI model (from 4 to 256 latent dimensions), and training datasets (Irish folk, Turkish folk, Classical, and pop) have on music generation performance when 2 or 4 meaningful musical attributes are imposed on the generative model.
Music ControlNet: A model similar to SD ControlNetD that can accurately control music generation
While the image-domain Uni-ControlNet method already allows generation with any subset of controls, we devise a new strategy to allow creators to input controls that are only partially specified in time.