AudioCaps is a dataset of sounds with event descriptions that was introduced for the task of audio captioning, with sounds sourced from the AudioSet dataset. Annotators were provided the audio tracks together with category hints (and with additional video hints if needed).
176 PAPERS • 10 BENCHMARKS
Audio-alpaca: A preference dataset for aligning text-to-audio models Audio-alpaca is a pairwise preference dataset containing about 15k (prompt,chosen, rejected) triplets where given a textual prompt, chosen is the preferred generated audio and rejected is the undesirable audio.
1 PAPER • NO BENCHMARKS YET