Acoustic Scene Classification
37 papers with code • 5 benchmarks • 10 datasets
The goal of acoustic scene classification is to classify a test recording into one of the provided predefined classes that characterizes the environment in which it was recorded.
Source: DCASE 2019 Source: DCASE 2018
Datasets
Latest papers
Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift
In addition, considering the abundance of unlabeled acoustic scene data in the real world, it is important to study the possible ways to utilize these unlabelled data.
AudioLog: LLMs-Powered Long Audio Logging with Hybrid Token-Semantic Contrastive Learning
This paper presents AudioLog, a large language models (LLMs)-powered audio logging system with hybrid token-semantic contrastive learning.
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Recently, instruction-following audio-language models have received broad attention for audio interaction with humans.
Audio Event-Relational Graph Representation Learning for Acoustic Scene Classification
The results show the feasibility of recognizing diverse acoustic scenes based on the audio event-relational graph.
Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification
The correlation between the sharpness of loss minima and generalisation in the context of deep neural networks has been subject to discussion for a long time.
Device-Robust Acoustic Scene Classification via Impulse Response Augmentation
However, we also show that DIR augmentation and Freq-MixStyle are complementary, achieving a new state-of-the-art performance on signals recorded by devices unseen during training.
Unsupervised Improvement of Audio-Text Cross-Modal Representations
In this paper, we study unsupervised approaches to improve the learning framework of such representations with unpaired text and audio.
CochlScene: Acquisition of acoustic scene data using crowdsourcing
This paper describes a pipeline for collecting acoustic scene data by using crowdsourcing.
Multi-dimensional Edge-based Audio Event Relational Graph Representation Learning for Acoustic Scene Classification
Experiments on a polyphonic acoustic scene dataset show that the proposed ERGL achieves competitive performance on ASC by using only a limited number of embeddings of audio events without any data augmentations.
Efficient Similarity-based Passive Filter Pruning for Compressing CNNs
However, the computational complexity of computing the pairwise similarity matrix is high, particularly when a convolutional layer has many filters.