Audio Classification
133 papers with code • 20 benchmarks • 35 datasets
Audio Classification is a machine learning task that involves identifying and tagging audio signals into different classes or categories. The goal of audio classification is to enable machines to automatically recognize and distinguish between different types of audio, such as music, speech, and environmental sounds.
Libraries
Use these libraries to find Audio Classification models and implementationsDatasets
Subtasks
Latest papers
Investigating the Emergent Audio Classification Ability of ASR Foundation Models
Text and vision foundation models can perform many tasks in a zero-shot setting, a desirable property that enables these systems to be applied in general and low-resource settings.
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Recently, instruction-following audio-language models have received broad attention for audio interaction with humans.
Adversarial Fine-tuning using Generated Respiratory Sound to Address Class Imbalance
In this work, we propose a straightforward approach to augment imbalanced respiratory sound data using an audio diffusion model as a conditional neural vocoder.
Auto deep learning for bioacoustic signals
This study investigates the potential of automated deep learning to enhance the accuracy and efficiency of multi-class classification of bird vocalizations, compared against traditional manually-designed deep learning models.
Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models
Audio Spectrogram Transformers are excellent at exploiting large datasets, creating powerful pre-trained models that surpass CNNs when fine-tuned on downstream tasks.
CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition
Using a large multilingual audio corpus and self-supervised learning, CLARA develops speech representations enriched with emotions, advancing emotion-aware multilingual speech processing.
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M.
Audio classification with Dilated Convolution with Learnable Spacings
Dilated convolution with learnable spacings (DCLS) is a recent convolution method in which the positions of the kernel elements are learned throughout training by backpropagation.
EDAC: Efficient Deployment of Audio Classification Models For COVID-19 Detection
Various researchers made use of machine learning methods in an attempt to detect COVID-19.
AudRandAug: Random Image Augmentations for Audio Classification
To address this gap, we introduce AudRandAug, an adaptation of RandAug for audio data.