AudioCaps

20 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in AudioCaps

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Most implemented papers

Most implemented Social Latest No code

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

haoheliu/AudioLDM • • 29 Jan 2023

By learning the latent representations of audio signals and their compositions without modeling the cross-modal relationship, AudioLDM is advantageous in both generation quality and computational efficiency.

Paper
Code

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

OFA-Sys/ONE-PEACE • • 18 May 2023

In this work, we explore a scalable way for building a general representation model toward unlimited modalities.

Paper
Code

Audio Retrieval with Natural Language Queries

oncescuandreea/audio-retrieval • • 5 May 2021

We consider the task of retrieving audio using free-form natural language queries.

Paper
Code

Audio Captioning Transformer

XinhaoMei/ACT • • 21 Jul 2021

In this paper, we propose an Audio Captioning Transformer (ACT), which is a full Transformer network based on an encoder-decoder architecture and is totally convolution-free.

Paper
Code

Can Audio Captions Be Evaluated with Image Caption Metrics?

blmoistawinde/fense • • 10 Oct 2021

Current metrics are found in poor correlation with human annotations on these datasets.

Paper
Code

AUTOMATED AUDIO CAPTIONING BY FINE-TUNING BART WITH AUDIOSET TAGS

felixgontier/dcase2021aac • • DCASE workshop 2021

utomated audio captioning is the multimodal task of describing environmental audio recordings with fluent natural language.

Paper
Code

Audio Retrieval with Natural Language Queries: A Benchmark Study

akoepke/audio-retrieval-benchmark • • 17 Dec 2021

Additionally, we introduce the SoundDescs benchmark, which consists of paired audio and natural language descriptions for a diverse collection of sounds that are complementary to those found in AudioCaps and Clotho.

Paper
Code

Separate What You Describe: Language-Queried Audio Source Separation

liuxubo717/lass • • 28 Mar 2022

In this paper, we introduce the task of language-queried audio source separation (LASS), which aims to separate a target source from an audio mixture based on a natural language query of the target source (e. g., "a man tells a joke followed by people laughing").

Paper
Code

On Metric Learning for Audio-Text Cross-Modal Retrieval

XinhaoMei/audio-text_retrieval • • 29 Mar 2022

We present an extensive evaluation of popular metric learning objectives on the AudioCaps and Clotho datasets.

Paper
Code

Audio Retrieval with WavText5K and CLAP Training

microsoft/wavtext5k • 28 Sep 2022

In this work, we propose a new collection of web audio-text pairs and a new framework for retrieval.

Paper
Code

AudioCaps

Benchmarks Add a Result

Most implemented papers

Content

Benchmarks

Add a Result