Search Results for author: Paul Primus

Found 7 papers, 4 papers with code

Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets

1 code implementation • 8 Aug 2023 • Paul Primus, Khaled Koutini, Gerhard Widmer

This work presents a text-to-audio-retrieval system based on pre-trained text and spectrogram transformers.

Retrieval Text to Audio Retrieval

Paper
Code

Improving Natural-Language-based Audio Retrieval with Transfer Learning and Audio & Text Augmentations

no code implementations • 24 Aug 2022 • Paul Primus, Gerhard Widmer

The absence of large labeled datasets remains a significant challenge in many application areas of deep learning.

Data Augmentation Natural Language Queries +2

Paper
Add Code

Improved Zero-Shot Audio Tagging & Classification with Patchout Spectrogram Transformers

no code implementations • 24 Aug 2022 • Paul Primus, Gerhard Widmer

Standard machine learning models for tagging and classifying acoustic signals cannot handle classes that were not seen during training.

Audio Tagging Classification +2

Paper
Add Code

Anomalous Sound Detection as a Simple Binary Classification Problem with Careful Selection of Proxy Outlier Examples

1 code implementation • 5 Nov 2020 • Paul Primus, Verena Haunschmid, Patrick Praher, Gerhard Widmer

If no data with similar sounds and matching recording conditions is available, data sets with a larger diversity in these two dimensions are preferable.

Anomaly Detection Binary Classification +1

Paper
Code

Receptive-Field Regularized CNNs for Music Classification and Tagging

1 code implementation • 27 Jul 2020 • Khaled Koutini, Hamid Eghbal-zadeh, Verena Haunschmid, Paul Primus, Shreyan Chowdhury, Gerhard Widmer

However, the MIR field is still dominated by the classical VGG-based CNN architecture variants, often in combination with more complex modules such as attention, and/or techniques such as pre-training on large datasets.

Classification General Classification +4

Paper
Code

On Data Augmentation and Adversarial Risk: An Empirical Analysis

no code implementations • 6 Jul 2020 • Hamid Eghbal-zadeh, Khaled Koutini, Paul Primus, Verena Haunschmid, Michal Lewandowski, Werner Zellinger, Bernhard A. Moser, Gerhard Widmer

Data augmentation techniques have become standard practice in deep learning, as it has been shown to greatly improve the generalisation abilities of models.

Adversarial Attack Data Augmentation

Paper
Add Code

Exploiting Parallel Audio Recordings to Enforce Device Invariance in CNN-based Acoustic Scene Classification

1 code implementation • 4 Sep 2019 • Paul Primus, Hamid Eghbal-zadeh, David Eitelsebner, Khaled Koutini, Andreas Arzt, Gerhard Widmer

Distribution mismatches between the data seen at training and at application time remain a major challenge in all application areas of machine learning.

Acoustic Scene Classification BIG-bench Machine Learning +3

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.