3 dataset results for Recommendation Systems AND Audio

The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.

10 PAPERS • 2 BENCHMARKS

Spotify Podcast

A set of approximately 100K podcast episodes comprised of raw audio files along with accompanying ASR transcripts. This represents over 47,000 hours of transcribed audio, and is an order of magnitude larger than previous speech-to-text corpora.

3 PAPERS • NO BENCHMARKS YET

CAL10K

CAL10K (Computer Audition Lab 10000)

The CAL10K dataset (introduced as Swat10k) contains 10,870 songs that are weakly-labelled using a tag vocabulary of 475 acoustic tags and 153 genre tags. The tags have all been harvested from Pandora’s website and result from song annotations performed by expert musicologists involved with the Music Genome Project.

1 PAPER • NO BENCHMARKS YET

Datasets

3 dataset results for Recommendation Systems AND Audio