Groove2Groove: One-Shot Music Style Transfer with Supervision from Synthetic Data

IEEE/ACM Transactions on Audio, Speech, and Language Processing 2020 · Ondřej Cífka, Umut Şimşekli, Gaël Richard ·

Style transfer is the process of changing the style of an image, video, audio clip or musical piece so as to match the style of a given example. Even though the task has interesting practical applications within the music industry, it has so far received little attention from the audio and music processing community. In this paper, we present Groove2Groove, a one-shot style transfer method for symbolic music, focusing on the case of accompaniment styles in popular music and jazz. We propose an encoder-decoder neural network for the task, along with a synthetic data generation scheme to supply it with parallel training examples. This synthetic parallel data allows us to tackle the style transfer problem using end-to-end supervised learning, employing powerful techniques used in natural language processing. We experimentally demonstrate the performance of the model on style transfer using existing and newly proposed metrics, and also explore the possibility of style interpolation.

PDF Abstract