Audio Tagging

41 papers with code • 1 benchmarks • 8 datasets

Audio tagging is a task to predict the tags of audio clips. Audio tagging tasks include music tagging, acoustic scene classification, audio event classification, etc.

Libraries

Use these libraries to find Audio Tagging models and implementations

Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input

nttcslab/m2d 26 Oct 2022

We propose a new method, Masked Modeling Duo (M2D), that learns representations directly while obtaining training signals using only masked patches.

39
26 Oct 2022

Contrastive Audio-Visual Masked Autoencoder

yuangongnd/cav-mae 2 Oct 2022

In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single modality to audio-visual multi-modalities.

203
02 Oct 2022

Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer

zhaoyanpeng/vipant NAACL 2022

In a difficult zero-shot setting with no paired audio-text data, our model demonstrates state-of-the-art zero-shot performance on the ESC50 and US8K audio classification tasks, and even surpasses the supervised state of the art for Clotho caption retrieval (with audio queries) by 2. 2\% R@1.

19
16 Dec 2021

Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data

RetroCirce/Zero_Shot_Audio_Source_Separation 15 Dec 2021

Our approach uses a single model for source separation of multiple sound types, and relies solely on weakly-labeled data for training.

166
15 Dec 2021

Efficient Training of Audio Transformers with Patchout

kkoutini/passt 11 Oct 2021

However, one of the main shortcomings of transformer models, compared to the well-established CNNs, is the computational complexity.

279
11 Oct 2021

Sound Event Detection Transformer: An Event-based End-to-End Model for Sound Event Detection

965694547/hybrid-system-of-frame-wise-model-and-sedt 5 Oct 2021

A critical issue with the frame-based model is that it pursues the best frame-level prediction rather than the best event-level prediction.

22
05 Oct 2021

Weakly-Supervised Classification and Detection of Bird Sounds in the Wild.

kumar-shubham-ml/kaggle-birdclef-2021 CLEF 2021

It is easier to hear birds than see them, however, they still play an essential role in nature and they are excellent indicators of deteriorating environmental quality and pollution.

21
10 Jul 2021

THE SJTU SYSTEM FOR DCASE2021 CHALLENGE TASK 6: AUDIO CAPTIONING BASED ON ENCODER PRE-TRAINING AND REINFORCEMENT LEARNING

wsntxxn/AudioCaption DCASE Challenge 2021

This report proposes an audio captioning system for the Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 challenge task Task 6.

30
06 Jul 2021

Cross-Referencing Self-Training Network for Sound Event Detection in Audio Mixtures

JHU-LCAP/CRSTmodel 27 May 2021

Sound event detection is an important facet of audio tagging that aims to identify sounds of interest and define both the sound category and time boundaries for each sound event in a continuous recording.

4
27 May 2021

A Modulation Front-End for Music Audio Tagging

rastegah/modnet 25 May 2021

Modulation filter bank representations that have been actively researched as a basis for timbre perception have the potential to facilitate the extraction of perceptually salient features.

3
25 May 2021