1 code implementation • 8 Aug 2023 • Paul Primus, Khaled Koutini, Gerhard Widmer
This work presents a text-to-audio-retrieval system based on pre-trained text and spectrogram transformers.
no code implementations • 24 Aug 2022 • Paul Primus, Gerhard Widmer
The absence of large labeled datasets remains a significant challenge in many application areas of deep learning.
no code implementations • 24 Aug 2022 • Paul Primus, Gerhard Widmer
Standard machine learning models for tagging and classifying acoustic signals cannot handle classes that were not seen during training.
1 code implementation • 5 Nov 2020 • Paul Primus, Verena Haunschmid, Patrick Praher, Gerhard Widmer
If no data with similar sounds and matching recording conditions is available, data sets with a larger diversity in these two dimensions are preferable.
1 code implementation • 27 Jul 2020 • Khaled Koutini, Hamid Eghbal-zadeh, Verena Haunschmid, Paul Primus, Shreyan Chowdhury, Gerhard Widmer
However, the MIR field is still dominated by the classical VGG-based CNN architecture variants, often in combination with more complex modules such as attention, and/or techniques such as pre-training on large datasets.
no code implementations • 6 Jul 2020 • Hamid Eghbal-zadeh, Khaled Koutini, Paul Primus, Verena Haunschmid, Michal Lewandowski, Werner Zellinger, Bernhard A. Moser, Gerhard Widmer
Data augmentation techniques have become standard practice in deep learning, as it has been shown to greatly improve the generalisation abilities of models.
1 code implementation • 4 Sep 2019 • Paul Primus, Hamid Eghbal-zadeh, David Eitelsebner, Khaled Koutini, Andreas Arzt, Gerhard Widmer
Distribution mismatches between the data seen at training and at application time remain a major challenge in all application areas of machine learning.