Audio Classification

133 papers with code • 20 benchmarks • 35 datasets

Audio Classification is a machine learning task that involves identifying and tagging audio signals into different classes or categories. The goal of audio classification is to enable machines to automatically recognize and distinguish between different types of audio, such as music, speech, and environmental sounds.

Benchmarks

Add a Result

These leaderboards are used to track progress in Audio Classification

Dataset	Best Model	Compare
AudioSet	OmniVec	See all
ESC-50	InternVideo2	See all
VGGSound	Mirasol3B	See all
ICBHI Respiratory Sound Database	AST (Patch-Mix CL)	See all
SHD	Event-SSM	See all
FSD50K	ONE-PEACE	See all
Speech Commands	AST-S	See all
DCASE	CrissCross (AudioSet)	See all
Balanced Audio Set	BEATs	See all
SSC	Event-SSM	See all
EPIC-KITCHENS-100	Audiovisual Masked Autoencoder (Audiovisual, Single)	See all
BirdCLEF 2021	EfficientLEAF (8s)	See all
DiCOVA	AUCO ResNet	See all
CREMA-D	EfficientLEAF	See all
RAVDESS	ASM-RH-A	See all
VocalSound	VocalSound Baseline	See all
Multimodal PISA	MMDL	See all
UCR Time Series Classification Archive	CDIL	See all
DEEP-VOICE: DeepFake Voice Recognition	XGBoost (330)	See all
EPIC-SOUNDS	Mirasol3B (A+V)	See all

Show all 20 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Audio Classification models and implementations

Sreyan88/LAPE

3 papers

towhee-io/towhee

2 papers

3,001

google-research/leaf-audio

2 papers

475

faceonlive/ai-research

2 papers

189

See all 7 libraries.

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

Investigating the Emergent Audio Classification Ability of ASR Foundation Models

julirao/whisper_audio_classification • • 15 Nov 2023

Text and vision foundation models can perform many tasks in a zero-shot setting, a desirable property that enables these systems to be applied in general and low-resource settings.

15 Nov 2023

Paper
Code

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

alibaba-damo-academy/FunASR • • 14 Nov 2023

Recently, instruction-following audio-language models have received broad attention for audio interaction with humans.

3,393

14 Nov 2023

Paper
Code

Adversarial Fine-tuning using Generated Respiratory Sound to Address Class Imbalance

kaen2891/adversarial_fine-tuning_using_generated_respiratory_sound • • 11 Nov 2023

In this work, we propose a straightforward approach to augment imbalanced respiratory sound data using an audio diffusion model as a conditional neural vocoder.

11 Nov 2023

Paper
Code

Auto deep learning for bioacoustic signals

giuliotosato/autokeras-bioacustic • • 8 Nov 2023

This study investigates the potential of automated deep learning to enhance the accuracy and efficiency of multi-class classification of bird vocalizations, compared against traditional manually-designed deep learning models.

08 Nov 2023

Paper
Code

Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models

fschmid56/efficientat • • 24 Oct 2023

Audio Spectrogram Transformers are excellent at exploiting large datasets, creating powerful pre-trained models that surpass CNNs when fine-tuned on downstream tasks.

183

24 Oct 2023

Paper
Code

CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition

knoriy/CLARA • • 18 Oct 2023

Using a large multilingual audio corpus and self-supervised learning, CLARA develops speech representations enriched with emotions, advancing emotion-aware multilingual speech processing.

18 Oct 2023

Paper
Code

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

PKU-YuanGroup/Video-LLaVA • • 3 Oct 2023

We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M.

2,428

03 Oct 2023

Paper
Code

Audio classification with Dilated Convolution with Learnable Spacings

k-h-ismail/dilated-convolution-with-learnable-spacings-pytorch • • 25 Sep 2023

Dilated convolution with learnable spacings (DCLS) is a recent convolution method in which the positions of the kernel elements are learned throughout training by backpropagation.

25 Sep 2023

Paper
Code

EDAC: Efficient Deployment of Audio Classification Models For COVID-19 Detection

edac-ml4h/edac-ml4h • • 11 Sep 2023

Various researchers made use of machine learning methods in an attempt to detect COVID-19.

11 Sep 2023

Paper
Code

AudRandAug: Random Image Augmentations for Audio Classification

turab45/audrandaug • 9 Sep 2023

To address this gap, we introduce AudRandAug, an adaptation of RandAug for audio data.

09 Sep 2023

Paper
Code

Audio Classification

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result