2 code implementations • 18 Jan 2024 • Tan Dat Nguyen, Ji-Hoon Kim, Youngjoon Jang, Jaehun Kim, Joon Son Chung
The goal of this paper is to generate realistic audio with a lightweight and fast diffusion-based vocoder, named FreGrad.
no code implementations • 17 Jan 2024 • Matthew C. McCallum, Florian Henkel, Jaehun Kim, Samuel E. Sandberg, Matthew E. P. Davies
We propose tempo translation functions that allow for efficient manipulation of tempo within a pre-existing embedding space whilst maintaining other properties such as genre.
no code implementations • 17 Jan 2024 • Florian Henkel, Jaehun Kim, Matthew C. McCallum, Samuel E. Sandberg, Matthew E. P. Davies
This paper addresses the problem of global tempo estimation in musical audio.
no code implementations • 17 Jan 2024 • Matthew C. McCallum, Matthew E. P. Davies, Florian Henkel, Jaehun Kim, Samuel E. Sandberg
Similarly, we show that the optimal selection of data augmentation strategies for contrastive learning of music audio embeddings is dependent on the downstream task, highlighting this as an important embedding design decision.
no code implementations • 30 Oct 2023 • Suyeon Lee, Chaeyoung Jung, Youngjoon Jang, Jaehun Kim, Joon Son Chung
For an effective fusion of the two modalities for diffusion, we also propose a cross-attention-based feature fusion mechanism.
no code implementations • 29 Aug 2023 • Ji-Hoon Kim, Jaehun Kim, Joon Son Chung
In this paper, we propose a novel lip-to-speech system that significantly improves the generation quality by alleviating the one-to-many mapping problem from multiple perspectives.
no code implementations • 12 Aug 2023 • Andres Ferraro, Jaehun Kim, Sergio Oramas, Andreas Ehmann, Fabien Gouyon
We demonstrate our method successfully combines complementary information from diverse modalities, and is more robust to missing modality data (i. e., it better handles the retrieval of artists with different modality embeddings than the query artist's).
no code implementations • 11 Nov 2019 • Hyemin Ahn, Jaehun Kim, Kihyun Kim, Songhwai Oh
The trained dance pose generator, which is a generative autoregressive model, is able to synthesize a dance sequence longer than 5, 000 pose frames.
no code implementations • 15 Apr 2019 • Jaehun Kim, Julián Urbano, Cynthia C. S. Liem, Alan Hanjalic
The underlying assumption is that in case a deep representation is to be trusted, distance consistency between known related points should be maintained both in the input audio space and corresponding latent deep space.
no code implementations • 12 Feb 2019 • Dae-Woong Jeong, Jaehun Kim, Young-Seok Kim, Tae-Ho Kim, Myungsu Chae
Existing high-performance deep learning models require very intensive computing.
1 code implementation • 5 May 2018 • Jaehun Kim, Minz Won, Xavier Serra, Cynthia C. S. Liem
The automated recognition of music genres from audio information is a challenging problem, as genre labels are subjective and noisy.
1 code implementation • 12 Feb 2018 • Jaehun Kim, Julián Urbano, Cynthia C. S. Liem, Alan Hanjalic
In this paper, we present the results of our investigation of what are the most important factors to generate deep representations for the data and learning tasks in the music domain.
1 code implementation • 31 May 2016 • Yoonchang Han, Jaehun Kim, Kyogu Lee
We train our network from fixed-length music excerpts with a single-labeled predominant instrument and estimate an arbitrary number of predominant instruments from an audio signal with a variable length.