Sound Event Detection

74 papers with code • 4 benchmarks • 18 datasets

Sound Event Detection (SED) is the task of recognizing the sound events and their respective temporal start and end time in a recording. Sound events in real life do not always occur in isolation, but tend to considerably overlap with each other. Recognizing such overlapping sound events is referred as polyphonic SED.

Source: A report on sound event detection with different binaural features

Libraries

Use these libraries to find Sound Event Detection models and implementations

Latest papers with no code

Semi-supervised Sound Event Detection with Local and Global Consistency Regularization

no code yet • 15 Sep 2023

Then, the local consistency is adopted to encourage the model to leverage local features for frame-level predictions, and the global consistency is applied to force features to align with global prototypes through a specially designed contrastive loss.

Furnishing Sound Event Detection with Language Model Abilities

no code yet • 22 Aug 2023

Recently, the ability of language models (LMs) has attracted increasing attention in visual cross-modality.

DiffSED: Sound Event Detection with Denoising Diffusion

no code yet • 14 Aug 2023

In this work, we reformulate the SED problem by taking a generative learning perspective.

Auditory Neural Response Inspired Sound Event Detection Based on Spectro-temporal Receptive Field

no code yet • 20 Jun 2023

In this work, we utilized STRF as a kernel of the first convolutional layer in SED model to extract neural response from input sound to make SED model similar to human auditory system.

Channel-Spatial-Based Few-Shot Bird Sound Event Detection

no code yet • 18 Jun 2023

In this paper, we propose a model for bird sound event detection that focuses on a small number of training samples within the everyday long-tail distribution.

Semi-supervsied Learning-based Sound Event Detection using Freuqency Dynamic Convolution with Large Kernel Attention for DCASE Challenge 2023 Task 4

no code yet • 10 Jun 2023

The proposed FDY with LKA-CRNN with a BEATs embedding network is initially trained on the entire DCASE 2023 Task 4 dataset using the mean-teacher approach, generating pseudo-labels for weakly labeled, unlabeled, and the AudioSet.

Divided spectro-temporal attention for sound event localization and detection in real scenes for DCASE2023 challenge

no code yet • 5 Jun 2023

Localizing sounds and detecting events in different room environments is a difficult task, mainly due to the wide range of reflections and reverberations.

A Multi-Task Learning Framework for Sound Event Detection using High-level Acoustic Characteristics of Sounds

no code yet • 18 May 2023

Sound event detection (SED) entails identifying the type of sound and estimating its temporal boundaries from acoustic signals.

Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations

no code yet • 3 May 2023

This work investigates pretrained audio representations for few shot Sound Event Detection.

Leveraging Audio-Tagging Assisted Sound Event Detection using Weakified Strong Labels and Frequency Dynamic Convolutions

no code yet • 25 Apr 2023

Stage-1 of our proposed framework focuses on audio-tagging (AT), which assists the sound event detection (SED) system in Stage-2.