AudioCaps

20 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in AudioCaps

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Most implemented papers

Most implemented Social Latest No code

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

liuxubo717/v-act • • 28 Oct 2022

Audio captioning aims to generate text descriptions of audio clips.

Paper
Code

Is my automatic audio captioning system so bad? spider-max: a metric to consider several caption candidates

labbeti/aac-metrics • • 14 Nov 2022

For this reason, several complementary metrics, such as BLEU, CIDEr, SPICE and SPIDEr, are used to compare a single automatic caption to one or several captions of reference, produced by a human annotator.

Paper
Code

Accommodating Audio Modality in CLIP for Multimodal Processing

ludanruan/clip4vla • • 12 Mar 2023

In this paper, we extend the stateof-the-art Vision-Language model CLIP to accommodate the audio modality for Vision-Language-Audio multimodal processing.

Paper
Code

Target Sound Extraction with Variable Cross-modality Clues

lichenda/multi-clue-tse-data • • 15 Mar 2023

Automatic target sound extraction (TSE) is a machine learning approach to mimic the human auditory perception capability of attending to a sound source of interest from a mixture of sources.

Paper
Code

Prefix tuning for automated audio captioning

MinkyuKim26/Prefix_AAC_ICASSP2023 • • 30 Mar 2023

Audio captioning aims to generate text descriptions from environmental sounds.

Paper
Code

Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

declare-lab/tango • • 24 Apr 2023

The immense scale of the recent large language models (LLM) allows many interesting properties, such as, instruction- and chain-of-thought-based fine-tuning, that has significantly improved zero- and few-shot performance in many natural language processing (NLP) tasks.

Paper
Code

RECAP: Retrieval-Augmented Audio Captioning

sreyan88/recap • • 18 Sep 2023

We present RECAP (REtrieval-Augmented Audio CAPtioning), a novel and effective audio captioning system that generates captions conditioned on an input audio and other captions similar to the audio retrieved from a datastore.

Paper
Code