8 dataset results for Music AND English

MusicCaps is a dataset composed of 5.5k music-text pairs, with rich text descriptions provided by human experts. For each 10-second music clip, MusicCaps provides:

37 PAPERS • 1 BENCHMARK

FineDance

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

7 PAPERS • 1 BENCHMARK

SymphonyNet

First large-scale symphony generation dataset.

7 PAPERS • 1 BENCHMARK

Multi-Label Classification Dataset Repository

For each dataset we provide a short description as well as some characterization metrics. It includes the number of instances (m), number of attributes (d), number of labels (q), cardinality (Card), density (Dens), diversity (Div), average Imbalance Ratio per label (avgIR), ratio of unconditionally dependent label pairs by chi-square test (rDep) and complexity, defined as m × q × d as in [Read 2010]. Cardinality measures the average number of labels associated with each instance, and density is defined as cardinality divided by the number of labels. Diversity represents the percentage of labelsets present in the dataset divided by the number of possible labelsets. The avgIR measures the average degree of imbalance of all labels, the greater avgIR, the greater the imbalance of the dataset. Finally, rDep measures the proportion of pairs of labels that are dependent at 99% confidence. A broader description of all the characterization metrics and the used partition methods are described in

4 PAPERS • NO BENCHMARKS YET

Fingerprint Dataset

Fingerprint Dataset (Neural Audio Fingerprint Dataset)

This dataset includes all music sources, background noises and impulse-reponses (IR) samples and conversation speech that have been used in the work "Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning" ICASSP 2021 (https://arxiv.org/abs/2010.11910).

2 PAPERS • NO BENCHMARKS YET

YouTube8M-MusicTextClips

The YouTube8M-MusicTextClips dataset consists of over 4k high-quality human text descriptions of music found in video clips from the YouTube8M dataset.

2 PAPERS • NO BENCHMARKS YET

Jam-ALT

Jam-ALT (JamALT: A Formatting-Aware Lyrics Transcription Benchmark)

JamALT is a revision of the JamendoLyrics dataset (80 songs in 4 languages), adapted for use as an automatic lyrics transcription (ALT) benchmark.

1 PAPER • 5 BENCHMARKS

Song Describer Dataset

The Song Describer Dataset (SDD) contains ~1.1k captions for 706 permissively licensed music recordings. It is designed for use in evaluation of models that address music-and-language (M&L) tasks such as music captioning, text-to-music generation and music-language retrieval.

1 PAPER • NO BENCHMARKS YET

Datasets

8 dataset results for Music AND English