Search Results for author: Jordi Pons

Found 29 papers, 15 papers with code

Long-form music generation with latent diffusion

no code implementations • 16 Apr 2024 • Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor, Jordi Pons

Audio-based generative models for music have seen great strides recently, but so far have not managed to produce full-length music tracks with coherent musical structure.

Music Generation

Paper
Add Code

Fast Timing-Conditioned Latent Audio Diffusion

2 code implementations • 7 Feb 2024 • Zach Evans, CJ Carr, Josiah Taylor, Scott H. Hawley, Jordi Pons

Generating long-form 44. 1kHz stereo audio from text prompts can be computationally demanding.

Audio Generation

1,450

Paper
Code

GASS: Generalizing Audio Source Separation with Large-scale Data

no code implementations • 29 Sep 2023 • Jordi Pons, Xiaoyu Liu, Santiago Pascual, Joan Serrà

Here, we study a single general audio source separation (GASS) model trained to separate speech, music, and sound events in a supervised fashion with a large-scale dataset.

Audio Source Separation Speech Separation

Paper
Add Code

Mono-to-stereo through parametric stereo generation

no code implementations • 26 Jun 2023 • Joan Serrà, Davide Scaini, Santiago Pascual, Daniel Arteaga, Jordi Pons, Jeroen Breebaart, Giulio Cengarle

Generating a stereophonic presentation from a monophonic audio signal is a challenging open task, especially if the goal is to obtain a realistic spatial imaging with a specific panning of sound elements.

Paper
Add Code

CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models

no code implementations • 16 Jun 2023 • Hao-Wen Dong, Xiaoyu Liu, Jordi Pons, Gautam Bhattacharya, Santiago Pascual, Joan Serrà, Taylor Berg-Kirkpatrick, Julian McAuley

Our results show the effectiveness of the proposed method, and that the pretrained diffusion prior can reduce the modality transfer gap.

Audio Synthesis

Paper
Add Code

Towards Robust Image-in-Audio Deep Steganography

1 code implementation • 9 Mar 2023 • Jaume Ros, Margarita Geleta, Jordi Pons, Xavier Giro-i-Nieto

The field of steganography has experienced a surge of interest due to the recent advancements in AI-powered techniques, particularly in the context of multimodal setups that enable the concealment of signals within signals of a different nature.

Ranked #1 on Image Reconstruction on Audio Set

Image Reconstruction

Paper
Code

Full-band General Audio Synthesis with Score-based Diffusion

no code implementations • 26 Oct 2022 • Santiago Pascual, Gautam Bhattacharya, Chunghsin Yeh, Jordi Pons, Joan Serrà

Recent works have shown the capability of deep generative models to tackle general audio synthesis from a single label, producing a variety of impulsive, tonal, and environmental sounds.

Audio Synthesis

Paper
Add Code

Adversarial Permutation Invariant Training for Universal Sound Separation

no code implementations • 21 Oct 2022 • Emilian Postolache, Jordi Pons, Santiago Pascual, Joan Serrà

Universal sound separation consists of separating mixes with arbitrary sounds of different types, and permutation invariant training (PIT) is used to train source agnostic models that do so.

Paper
Add Code

Universal Speech Enhancement with Score-based Diffusion

no code implementations • 7 Jun 2022 • Joan Serrà, Santiago Pascual, Jordi Pons, R. Oguz Araz, Davide Scaini

We hope that both our methodology and technical contributions encourage researchers and practitioners to adopt a universal approach to speech enhancement, possibly framing it as a generative task.

Speech Enhancement

Paper
Add Code

On loss functions and evaluation metrics for music source separation

no code implementations • 16 Feb 2022 • Enric Gusó, Jordi Pons, Santiago Pascual, Joan Serrà

We investigate which loss functions provide better separations via benchmarking an extensive set of those for music source separation.

Audio Source Separation Benchmarking +1

Paper
Add Code

Upsampling layers for music source separation

no code implementations • 23 Nov 2021 • Jordi Pons, Joan Serrà, Santiago Pascual, Giulio Cengarle, Daniel Arteaga, Davide Scaini

Upsampling artifacts are caused by problematic upsampling layers and due to spectral replicas that emerge while upsampling.

Music Source Separation

Paper
Add Code

On tuning consistent annealed sampling for denoising score matching

no code implementations • 8 Apr 2021 • Joan Serrà, Santiago Pascual, Jordi Pons

Score-based generative models provide state-of-the-art quality for image and audio synthesis.

Audio Synthesis Denoising

Paper
Add Code

Multichannel-based learning for audio object extraction

no code implementations • 11 Feb 2021 • Daniel Arteaga, Jordi Pons

The current paradigm for creating and deploying immersive audio content is based on audio objects, which are composed of an audio track and position metadata.

Sound Audio and Speech Processing

Paper
Add Code

On permutation invariant training for speech source separation

no code implementations • 9 Feb 2021 • Xiaoyu Liu, Jordi Pons

We study permutation invariant training (PIT), which targets at the permutation ambiguity problem for speaker independent source separation models.

Clustering Speaker Separation

Paper
Add Code

Upsampling artifacts in neural audio synthesis

1 code implementation • 27 Oct 2020 • Jordi Pons, Santiago Pascual, Giulio Cengarle, Joan Serrà

We then compare different upsampling layers, showing that nearest neighbor upsamplers can be an alternative to the problematic (but state-of-the-art) transposed and subpixel convolutions which are prone to introduce tonal artifacts.

Audio Signal Processing Audio Synthesis

Paper
Code

Automatic multitrack mixing with a differentiable mixing console of neural audio effects

1 code implementation • 20 Oct 2020 • Christian J. Steinmetz, Jordi Pons, Santiago Pascual, Joan Serrà

Applications of deep learning to automatic multitrack mixing are largely unexplored.

Audio and Speech Processing Sound

111

Paper
Code

SESQA: semi-supervised learning for speech quality assessment

no code implementations • 1 Oct 2020 • Joan Serrà, Jordi Pons, Santiago Pascual

Automatic speech quality assessment is an important, transversal task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen recording conditions, and a lack of flexibility of existing approaches.

Paper
Add Code

FSD50K: An Open Dataset of Human-Labeled Sound Events

8 code implementations • 1 Oct 2020 • Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra

Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific, with the exception of AudioSet, based on over 2M tracks from YouTube videos and encompassing over 500 sound classes.

Paper
Code

TensorFlow Audio Models in Essentia

no code implementations • 16 Mar 2020 • Pablo Alonso-Jiménez, Dmitry Bogdanov, Jordi Pons, Xavier Serra

Essentia is a reference open-source C++/Python library for audio and music analysis.

Music Tagging TAG

Paper
Add Code

An empirical study of Conv-TasNet

1 code implementation • 20 Feb 2020 • Berkan Kadioglu, Michael Horgan, Xiaoyu Liu, Jordi Pons, Dan Darcy, Vivek Kumar

Furthermore, we offer insights into the generalization capabilities of Conv-TasNet and the potential value of improvements to the encoder/decoder.

Decoder

Paper
Code

musicnn: Pre-trained convolutional neural networks for music audio tagging

4 code implementations • 14 Sep 2019 • Jordi Pons, Xavier Serra

Pronounced as "musician", the musicnn library contains a set of pre-trained musically motivated convolutional neural networks for music audio tagging: https://github. com/jordipons/musicnn.

Audio Tagging Transfer Learning

564

Paper
Code

End-to-end music source separation: is it possible in the waveform domain?

2 code implementations • 29 Oct 2018 • Francesc Lluís, Jordi Pons, Xavier Serra

Most of the currently successful source separation techniques use the magnitude spectrogram as input, and are therefore by default omitting part of the signal: the phase.

Ranked #26 on Music Source Separation on MUSDB18

Music Source Separation

220

Paper
Code

Training neural audio classifiers with few data

2 code implementations • 24 Oct 2018 • Jordi Pons, Joan Serrà, Xavier Serra

We investigate supervised learning strategies that improve the training of neural network audio classifiers on small annotated collections.

Acoustic Scene Classification General Classification +2

Paper
Code

General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline

3 code implementations • 26 Jul 2018 • Eduardo Fonseca, Manoj Plakal, Frederic Font, Daniel P. W. Ellis, Xavier Favory, Jordi Pons, Xavier Serra

The goal of the task is to build an audio tagging system that can recognize the category of an audio clip from a subset of 41 diverse categories drawn from the AudioSet Ontology.

Audio Tagging Task 2

Paper
Code

Randomly weighted CNNs for (music) audio classification

2 code implementations • 1 May 2018 • Jordi Pons, Xavier Serra

The computer vision literature shows that randomly weighted neural networks perform reasonably as feature extractors.

Sound Audio and Speech Processing

142

Paper
Code

End-to-end learning for music audio tagging at scale

4 code implementations • 7 Nov 2017 • Jordi Pons, Oriol Nieto, Matthew Prockup, Erik Schmidt, Andreas Ehmann, Xavier Serra

The lack of data tends to limit the outcomes of deep learning research, particularly when dealing with end-to-end learning stacks processing raw data such as waveforms.

Sound Audio and Speech Processing

296

Paper
Code

Audio to score matching by combining phonetic and duration information

1 code implementation • 12 Jul 2017 • Rong Gong, Jordi Pons, Xavier Serra

We approach the singing phrase audio to score matching problem by using phonetic and duration information - with a focus on studying the jingju a cappella singing case.

Sound

Paper
Code

A Wavenet for Speech Denoising

7 code implementations • ICASSP 2018 2017 • Dario Rethage, Jordi Pons, Xavier Serra

In order to overcome this limitation, we propose an end-to-end learning method for speech denoising based on Wavenet.

Sound

660

Paper
Code

Timbre Analysis of Music Audio Signals with Convolutional Neural Networks

3 code implementations • 20 Mar 2017 • Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez, Xavier Serra

The focus of this work is to study how to efficiently tailor Convolutional Neural Networks (CNNs) towards learning timbre representations from log-mel magnitude spectrograms.

Sound

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.