The Lakh Pianoroll Dataset (LPD) is a collection of 174,154 multitrack pianorolls derived from the Lakh MIDI Dataset (LMD).

Getting the dataset

We provide multiple subsets and versions of the dataset (see here). The dataset is available here.

Using LPD

The multitrack pianorolls in LPD are stored in a special format for efficient I/O and to save space. We recommend to load the data with Pypianoroll (The dataset is created using Pypianoroll v0.3.0.). See here to learn how the data is stored and how to load the data properly.

License

Lakh Pianoroll Dataset is a derivative of Lakh MIDI Dataset by Colin Raffel, used under CC BY 4.0. Lakh Pianoroll Dataset is licensed under CC BY 4.0 by Hao-Wen Dong and Wen-Yi Hsiao.

Please cite the following papers if you use Lakh Pianoroll Dataset in a published work.

  • Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, and Yi-Hsuan Yang, "MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment," in Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI), 2018.

  • Colin Raffel, "Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching," PhD Thesis, 2016.

Related projects

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


Modalities


Languages