DCASENET: A joint pre-trained deep neural network for detecting and classifying acoustic scenes and events
Single task deep neural networks that perform a target task among diverse cross-related tasks in the acoustic scene and event literature are being developed. Few studies exist that investigate to combine such tasks, however, the work is at its preliminary stage. In this study, we propose an integrated deep neural network that can perform three tasks: acoustic scene classification, audio tagging, and sound event detection. Through vast experiments using three datasets, we show that the proposed system, DCASENet, itself can be directly used for any tasks with competitive results, or it can be further fine-tuned for the target task.
PDF Abstract