Graph-based Representation of Audio signals for Sound Event Classification
In recent years there has been a considerable rise in interest towards Graph Representation and Learning techniques, especially in such cases where data has intrinsically a graph- like structure: social networks, molecular lattices, or semantic interactions, just to name a few. In this paper, we propose a novel way to represent an audio signal from its spectrogram by deriving a graph-based representation which can be then employed by already established Graph Deep-Neural-Networks techniques. We evaluate this approach on a Sound Event Classification task by employing the widely used ESC and Urbansound8k datasets and compare it with a Convolutional Neural Network (CNN) based method. We show that such proposed graph-based approach is extremely compact and used in conjunction learned CNN features, allows for a significant increase in classification accuracy over the baseline with more than 50 times less parameters than the original CNN method. This suggests that, the proposed graph- based features can offer additional discriminative information on top of learned CNN features.
PDF