no code implementations • 16 Apr 2024 • Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor, Jordi Pons
Audio-based generative models for music have seen great strides recently, but so far have not managed to produce full-length music tracks with coherent musical structure.
2 code implementations • 7 Feb 2024 • Zach Evans, CJ Carr, Josiah Taylor, Scott H. Hawley, Jordi Pons
Generating long-form 44. 1kHz stereo audio from text prompts can be computationally demanding.
no code implementations • 29 Sep 2023 • Jordi Pons, Xiaoyu Liu, Santiago Pascual, Joan Serrà
Here, we study a single general audio source separation (GASS) model trained to separate speech, music, and sound events in a supervised fashion with a large-scale dataset.
no code implementations • 26 Jun 2023 • Joan Serrà, Davide Scaini, Santiago Pascual, Daniel Arteaga, Jordi Pons, Jeroen Breebaart, Giulio Cengarle
Generating a stereophonic presentation from a monophonic audio signal is a challenging open task, especially if the goal is to obtain a realistic spatial imaging with a specific panning of sound elements.
no code implementations • 16 Jun 2023 • Hao-Wen Dong, Xiaoyu Liu, Jordi Pons, Gautam Bhattacharya, Santiago Pascual, Joan Serrà, Taylor Berg-Kirkpatrick, Julian McAuley
Our results show the effectiveness of the proposed method, and that the pretrained diffusion prior can reduce the modality transfer gap.
1 code implementation • 9 Mar 2023 • Jaume Ros, Margarita Geleta, Jordi Pons, Xavier Giro-i-Nieto
The field of steganography has experienced a surge of interest due to the recent advancements in AI-powered techniques, particularly in the context of multimodal setups that enable the concealment of signals within signals of a different nature.
Ranked #1 on Image Reconstruction on Audio Set
no code implementations • 26 Oct 2022 • Santiago Pascual, Gautam Bhattacharya, Chunghsin Yeh, Jordi Pons, Joan Serrà
Recent works have shown the capability of deep generative models to tackle general audio synthesis from a single label, producing a variety of impulsive, tonal, and environmental sounds.
no code implementations • 21 Oct 2022 • Emilian Postolache, Jordi Pons, Santiago Pascual, Joan Serrà
Universal sound separation consists of separating mixes with arbitrary sounds of different types, and permutation invariant training (PIT) is used to train source agnostic models that do so.
no code implementations • 7 Jun 2022 • Joan Serrà, Santiago Pascual, Jordi Pons, R. Oguz Araz, Davide Scaini
We hope that both our methodology and technical contributions encourage researchers and practitioners to adopt a universal approach to speech enhancement, possibly framing it as a generative task.
no code implementations • 16 Feb 2022 • Enric Gusó, Jordi Pons, Santiago Pascual, Joan Serrà
We investigate which loss functions provide better separations via benchmarking an extensive set of those for music source separation.
no code implementations • 23 Nov 2021 • Jordi Pons, Joan Serrà, Santiago Pascual, Giulio Cengarle, Daniel Arteaga, Davide Scaini
Upsampling artifacts are caused by problematic upsampling layers and due to spectral replicas that emerge while upsampling.
no code implementations • 8 Apr 2021 • Joan Serrà, Santiago Pascual, Jordi Pons
Score-based generative models provide state-of-the-art quality for image and audio synthesis.
no code implementations • 11 Feb 2021 • Daniel Arteaga, Jordi Pons
The current paradigm for creating and deploying immersive audio content is based on audio objects, which are composed of an audio track and position metadata.
Sound Audio and Speech Processing
no code implementations • 9 Feb 2021 • Xiaoyu Liu, Jordi Pons
We study permutation invariant training (PIT), which targets at the permutation ambiguity problem for speaker independent source separation models.
1 code implementation • 27 Oct 2020 • Jordi Pons, Santiago Pascual, Giulio Cengarle, Joan Serrà
We then compare different upsampling layers, showing that nearest neighbor upsamplers can be an alternative to the problematic (but state-of-the-art) transposed and subpixel convolutions which are prone to introduce tonal artifacts.
1 code implementation • 20 Oct 2020 • Christian J. Steinmetz, Jordi Pons, Santiago Pascual, Joan Serrà
Applications of deep learning to automatic multitrack mixing are largely unexplored.
Audio and Speech Processing Sound
no code implementations • 1 Oct 2020 • Joan Serrà, Jordi Pons, Santiago Pascual
Automatic speech quality assessment is an important, transversal task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen recording conditions, and a lack of flexibility of existing approaches.
8 code implementations • 1 Oct 2020 • Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra
Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific, with the exception of AudioSet, based on over 2M tracks from YouTube videos and encompassing over 500 sound classes.
no code implementations • 16 Mar 2020 • Pablo Alonso-Jiménez, Dmitry Bogdanov, Jordi Pons, Xavier Serra
Essentia is a reference open-source C++/Python library for audio and music analysis.
1 code implementation • 20 Feb 2020 • Berkan Kadioglu, Michael Horgan, Xiaoyu Liu, Jordi Pons, Dan Darcy, Vivek Kumar
Furthermore, we offer insights into the generalization capabilities of Conv-TasNet and the potential value of improvements to the encoder/decoder.
4 code implementations • 14 Sep 2019 • Jordi Pons, Xavier Serra
Pronounced as "musician", the musicnn library contains a set of pre-trained musically motivated convolutional neural networks for music audio tagging: https://github. com/jordipons/musicnn.
2 code implementations • 29 Oct 2018 • Francesc Lluís, Jordi Pons, Xavier Serra
Most of the currently successful source separation techniques use the magnitude spectrogram as input, and are therefore by default omitting part of the signal: the phase.
Ranked #26 on Music Source Separation on MUSDB18
2 code implementations • 24 Oct 2018 • Jordi Pons, Joan Serrà, Xavier Serra
We investigate supervised learning strategies that improve the training of neural network audio classifiers on small annotated collections.
3 code implementations • 26 Jul 2018 • Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra
The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology.
2 code implementations • 1 May 2018 • Jordi Pons, Xavier Serra
The computer vision literature shows that randomly weighted neural networks perform reasonably as feature extractors.
Sound Audio and Speech Processing
4 code implementations • 7 Nov 2017 • Jordi Pons, Oriol Nieto, Matthew Prockup, Erik Schmidt, Andreas Ehmann, Xavier Serra
The lack of data tends to limit the outcomes of deep learning research, particularly when dealing with end-to-end learning stacks processing raw data such as waveforms.
Sound Audio and Speech Processing
1 code implementation • 12 Jul 2017 • Rong Gong, Jordi Pons, Xavier Serra
We approach the singing phrase audio to score matching problem by using phonetic and duration information - with a focus on studying the jingju a cappella singing case.
Sound
7 code implementations • ICASSP 2018 2017 • Dario Rethage, Jordi Pons, Xavier Serra
In order to overcome this limitation, we propose an end-to-end learning method for speech denoising based on Wavenet.
Sound
3 code implementations • 20 Mar 2017 • Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez, Xavier Serra
The focus of this work is to study how to efficiently tailor Convolutional Neural Networks (CNNs) towards learning timbre representations from log-mel magnitude spectrograms.
Sound