Audio Tagging

41 papers with code • 1 benchmarks • 8 datasets

Audio tagging is a task to predict the tags of audio clips. Audio tagging tasks include music tagging, acoustic scene classification, audio event classification, etc.

Benchmarks

Add a Result

These leaderboards are used to track progress in Audio Tagging

Trend	Dataset	Best Model	Paper	Code	Compare
	AudioSet	CAV-MAE (Audio-Visual)			See all

Libraries

Use these libraries to find Audio Tagging models and implementations

fschmid56/efficientat

2 papers

181

Datasets

Latest papers

Most implemented Social Latest No code

Perceptual Musical Features for Interpretable Audio Tagging

vaslyb/perceptible-music-tagging • 18 Dec 2023

In the age of music streaming platforms, the task of automatically tagging music audio has garnered significant attention, driving researchers to devise methods aimed at enhancing performance metrics on standard datasets.

18 Dec 2023

Paper
Code

Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models

fschmid56/efficientat • • 24 Oct 2023

Audio Spectrogram Transformers are excellent at exploiting large datasets, creating powerful pre-trained models that surpass CNNs when fine-tuned on downstream tasks.

181

24 Oct 2023

Paper
Code

Audio classification with Dilated Convolution with Learnable Spacings

k-h-ismail/dilated-convolution-with-learnable-spacings-pytorch • • 25 Sep 2023

Dilated convolution with learnable spacings (DCLS) is a recent convolution method in which the positions of the kernel elements are learned throughout training by backpropagation.

25 Sep 2023

Paper
Code

Audio Tagging on an Embedded Hardware Platform

gbibbo/ai4s-embedded • • 15 Jun 2023

In this paper, we analyze how the performance of large-scale pretrained audio neural networks designed for audio pattern recognition changes when deployed on a hardware such as Raspberry Pi.

15 Jun 2023

Paper
Code

Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks

audio-westlakeu/audiossl • • 7 Jun 2023

In order to tackle both clip-level and frame-level tasks, this paper proposes Audio Teacher-Student Transformer (ATST), with a clip-level version (named ATST-Clip) and a frame-level version (named ATST-Frame), responsible for learning clip-level and frame-level representations, respectively.

07 Jun 2023

Paper
Code

E-PANNs: Sound Recognition Using Efficient Pre-trained Audio Neural Networks

arshdeep-singh-boparai/e-panns • • 30 May 2023

Sounds carry an abundance of information about activities and events in our everyday environment, such as traffic noise, road works, music, or people talking.

30 May 2023

Paper
Code

Robust Cross-Modal Knowledge Distillation for Unconstrained Videos

gewu-lab/cross-modal-distillation • • 16 Apr 2023

However, such semantic consistency from the synchronization is hard to guarantee in unconstrained videos, due to the irrelevant modality noise and differentiated semantic correlation.

16 Apr 2023

Paper
Code

Zorro: the masked multimodal transformer

lucidrains/zorro-pytorch • • 23 Jan 2023

Attention-based models are appealing for multimodal processing because inputs from multiple modalities can be concatenated and fed to a single backbone network - thus requiring very little fusion engineering.

23 Jan 2023

Paper
Code

Ontology-aware Learning and Evaluation for Audio Tagging

haoheliu/ontology-aware-audio-tagging • • 22 Nov 2022

The proposed metric, ontology-aware mean average precision (OmAP) addresses the weaknesses of mAP by utilizing the AudioSet ontology information during the evaluation.

22 Nov 2022

Paper
Code

Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation

fschmid56/efficientat • • 9 Nov 2022

We provide models of different complexity levels, scaling from low-complexity models up to a new state-of-the-art performance of . 483 mAP on AudioSet.

181

09 Nov 2022

Paper
Code

Audio Tagging

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result