Paper

Semi-supervised Neural Chord Estimation Based on a Variational Autoencoder with Latent Chord Labels and Features

This paper describes a statistically-principled semi-supervised method of automatic chord estimation (ACE) that can make effective use of music signals regardless of the availability of chord annotations. The typical approach to ACE is to train a deep classification model (neural chord estimator) in a supervised manner by using only annotated music signals. In this discriminative approach, prior knowledge about chord label sequences (model output) has scarcely been taken into account. In contrast, we propose a unified generative and discriminative approach in the framework of amortized variational inference. More specifically, we formulate a deep generative model that represents the generative process of chroma vectors (observed variables) from discrete labels and continuous features (latent variables), which are assumed to follow a Markov model favoring self-transitions and a standard Gaussian distribution, respectively. Given chroma vectors as observed data, the posterior distributions of the latent labels and features are computed approximately by using deep classification and recognition models, respectively. These three models form a variational autoencoder and can be trained jointly in a semi-supervised manner. The experimental results show that the regularization of the classification model based on the Markov prior of chord labels and the generative model of chroma vectors improved the performance of ACE even under the supervised condition. The semi-supervised learning using additional non-annotated data can further improve the performance.

Results in Papers With Code
(↓ scroll down to see all results)