Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection

24 May 2019  ·  Liwei Lin, Xiangdong Wang, Hong Liu, Yueliang Qian ·

In this paper, a special decision surface for the weakly-supervised sound event detection (SED) and a disentangled feature (DF) for the multi-label problem in polyphonic SED are proposed. We approach SED as a multiple instance learning (MIL) problem and utilize a neural network framework with a pooling module to solve it. General MIL approaches include two kinds: the instance-level approaches and embedding-level approaches. We present a method of generating instance-level probabilities for the embedding level approaches which tend to perform better than the instance-level approaches in terms of bag-level classification but can not provide instance-level probabilities in current approaches. Moreover, we further propose a specialized decision surface (SDS) for the embedding-level attention pooling. We analyze and explained why an embedding-level attention module with SDS is better than other typical pooling modules from the perspective of the high-level feature space. As for the problem of the unbalanced dataset and the co-occurrence of multiple categories in the polyphonic event detection task, we propose a DF to reduce interference among categories, which optimizes the high-level feature space by disentangling it based on class-wise identifiable information and obtaining multiple different subspaces. Experiments on the dataset of DCASE 2018 Task 4 show that the proposed SDS and DF significantly improve the detection performance of the embedding-level MIL approach with an attention pooling module and outperform the first place system in the challenge by 6.6 percentage points.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here