no code implementations • 1 Apr 2024 • Injune Hwang, Kyogu Lee
Recently, there have been efforts to encode the linguistic information of speech using a self-supervised framework for speech synthesis.
no code implementations • 2 Feb 2024 • Jaeyeon Kim, Injune Hwang, Kyogu Lee
We propose a framework to learn semantics from raw audio signals using two types of representations, encoding contextual and phonetic information respectively.
no code implementations • 27 Jan 2024 • Haesun Joung, Kyogu Lee
Music auto-tagging is crucial for enhancing music discovery and recommendation.
1 code implementation • 8 Jan 2024 • Jayeon Yi, Junghyun Koo, Kyogu Lee
Clipping is a common nonlinear distortion that occurs whenever the input or output of an audio system exceeds the supported range.
no code implementations • 8 Jan 2024 • Jin Woo Lee, Gwang Seok An, Jeong-Yun Sun, Kyogu Lee
This paper delves into the analysis of nonlinear deformation induced by dielectric actuation in pre-stressed ideal dielectric elastomers.
1 code implementation • 24 Dec 2023 • SeongHyeon Go, Kyogu Lee
In this work, we propose a symbolic music generation model with the song structure graph analysis network.
no code implementations • 22 Nov 2023 • Jayeon Yi, Sungho Lee, Kyogu Lee
In the heart of "rhythm games" - games where players must perform actions in sync with a piece of music - are "charts", the directives to be given to players.
no code implementations • 24 Aug 2023 • Yunkee Chae, Junghyun Koo, Sungho Lee, Kyogu Lee
With the proliferation of video platforms on the internet, recording musical performances by mobile devices has become commonplace.
no code implementations • 24 Jul 2023 • Junghyun Koo, Yunkee Chae, Chang-Bin Jeon, Kyogu Lee
Music source separation (MSS) faces challenges due to the limited availability of correctly-labeled individual instrument tracks.
no code implementations • 22 May 2023 • Eungbeom Kim, Yunkee Chae, Jaeheon Sim, Kyogu Lee
Since ERM utilizes the averaged performance on the data samples regardless of a group such as healthy or dysarthric speakers, ASR systems are unaware of the performance disparities across the groups.
1 code implementation • 15 Nov 2022 • KyungSu Kim, Minju Park, Haesun Joung, Yunkee Chae, Yeongbeom Hong, SeongHyeon Go, Kyogu Lee
The Single-Instrument Encoder is trained to classify the instruments used in single-track audio, and we take its penultimate layer's activation as the instrument embedding.
1 code implementation • 14 Nov 2022 • Chang-Bin Jeon, Hyeongi Moon, Keunwoo Choi, Ben Sangbae Chon, Kyogu Lee
Second, to overcome the absence of existing multi-singing datasets for a training purpose, we present a strategy for construction of multiple singing mixtures using various single-singing datasets.
no code implementations • 11 Nov 2022 • Yoori Oh, Juheon Lee, Yoseob Han, Kyogu Lee
However, the emotional latent space generated from the existing models is difficult to control the continuous emotional intensity because of the entanglement of features like emotions, speakers, etc.
1 code implementation • 4 Nov 2022 • Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich, Kyogu Lee, Yuki Mitsufuji
We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song.
1 code implementation • 2 Nov 2022 • Jongho Choi, Kyogu Lee
Piano covers of pop music are enjoyed by many people.
no code implementations • 2 Nov 2022 • Jin Woo Lee, Kyogu Lee
We present a neural network for rendering binaural speech from given monaural audio, position, and orientation of the source.
no code implementations • 31 Oct 2022 • Eungbeom Kim, Jinhee Kim, Yoori Oh, KyungSu Kim, Minju Park, Jaeheon Sim, Jinwoo Lee, Kyogu Lee
In this paper, we aim to unveil the impact of data augmentation in audio-language multi-modal learning, which has not been explored despite its importance.
Ranked #2 on Audio to Text Retrieval on AudioCaps
no code implementations • 28 Jul 2022 • Minju Park, Kyogu Lee
Advanced music recommendation systems are being introduced along with the development of machine learning.
no code implementations • 6 Apr 2022 • Jin Woo Lee, Eungbeom Kim, Junghyun Koo, Kyogu Lee
Our study allows us to analyze which attribute of speech signals is advantageous for the CM systems.
1 code implementation • 6 Apr 2022 • Jin Woo Lee, Sungho Lee, Kyogu Lee
Especially for the data-driven approaches, existing HRTF datasets differ in spatial sampling distributions of source positions, posing a major problem when generalizing the method across multiple datasets.
1 code implementation • 17 Feb 2022 • Junghyun Koo, Seungryeol Paik, Kyogu Lee
Mastering is an essential step in music production, but it is also a challenging task that has to go through the hands of experienced audio engineers, where they adjust tone, space, and volume of a song.
2 code implementations • NeurIPS 2021 • Hyeong-Seok Choi, Juheon Lee, Wansoo Kim, Jie Hwan Lee, Hoon Heo, Kyogu Lee
We present a neural analysis and synthesis (NANSY) framework that can manipulate voice, pitch, and speed of an arbitrary speech signal.
no code implementations • 3 Mar 2021 • Junghyun Koo, Seungryeol Paik, Kyogu Lee
This method enables us to apply the reverb of the reference track to the source track to which the effect is desired.
1 code implementation • 5 Feb 2021 • Hyeong-Seok Choi, Sungjin Park, Jie Hwan Lee, Hoon Heo, Dongsuk Jeon, Kyogu Lee
Modern deep learning-based models have seen outstanding performance improvement with speech enhancement tasks.
3 code implementations • 22 Oct 2020 • Sungkyun Chang, Donmoon Lee, Jeongsoo Park, Hyungui Lim, Kyogu Lee, Karam Ko, Yoonchang Han
Most of existing audio fingerprinting systems have limitations to be used for high-specific audio retrieval at scale.
no code implementations • 9 Sep 2020 • Junghyun Koo, Jie Hwan Lee, Jaewoo Pyo, Yujin Jo, Kyogu Lee
In this work, we exploit various multi-modal features extracted from pre-trained networks to recognize Alzheimer's Dementia using a neural network, with a small dataset provided by the ADReSS Challenge at INTERSPEECH 2020.
no code implementations • ICLR 2020 • Hyeong-Seok Choi, Changdae Park, Kyogu Lee
We analyze the extent to which the network can naturally disentangle two latent factors that contribute to the generation of a face image - one that comes directly from a speech signal and the other that is not related to it - and explore whether the network can learn to generate natural human face image distribution by modeling these factors.
1 code implementation • ISMIR 2019 • Dasaem Jeong, Taegyun Kwon, Yoojin Kim, Kyogu Lee, Juhan Nam
In this paper, we present our application of deep neural network to modeling piano performance, which imitates the expressive control of tempo, dynamics, articulations and pedaling from pianists.
no code implementations • 29 Oct 2019 • Juheon Lee, Hyeong-Seok Choi, Junghyun Koo, Kyogu Lee
In this study, we define the identity of the singer with two independent concepts - timbre and singing style - and propose a multi-singer singing synthesis system that can model them separately.
Sound Audio and Speech Processing
no code implementations • 6 Aug 2019 • Juheon Lee, Hyeong-Seok Choi, Chang-Bin Jeon, Junghyun Koo, Kyogu Lee
In this paper, we propose an end-to-end Korean singing voice synthesis system from lyrics and a symbolic melody using the following three novel approaches: 1) phonetic enhancement masking, 2) local conditioning of text and pitch to the super-resolution network, and 3) conditional adversarial training.
Sound Audio and Speech Processing
7 code implementations • ICLR 2019 • Hyeong-Seok Choi, Jang-Hyun Kim, Jaesung Huh, Adrian Kim, Jung-Woo Ha, Kyogu Lee
Most deep learning-based models for speech enhancement have mainly focused on estimating the magnitude of spectrogram while reusing the phase from noisy speech for reconstruction.
1 code implementation • 24 Jan 2019 • Sungkyun Chang, Seungjin Lee, Kyogu Lee
This paper provides an outline of the algorithms submitted for the WSDM Cup 2019 Spotify Sequential Skip Prediction Challenge (team name: mimbres).
Ranked #1 on Sequential skip prediction on MSSD
4 code implementations • 22 May 2018 • Sungheon Park, Tae-hoon Kim, Kyogu Lee, Nojun Kwak
In this paper, we propose a simple yet effective method for multiple music source separation using convolutional neural networks.
Sound Audio and Speech Processing
2 code implementations • 4 Dec 2017 • Hyungui Lim, Seungyeon Rhyu, Kyogu Lee
Generating a chord progression from a monophonic melody is a challenging problem because a chord progression requires a series of layered notes played simultaneously.
no code implementations • 1 Dec 2017 • Sungkyun Chang, Juheon Lee, Sang Keun Choe, Kyogu Lee
To do this, we first build the CNN using as an input a cross-similarity matrix generated from a pair of songs.
no code implementations • 21 Jan 2017 • Sungkyun Chang, Kyogu Lee
Most of the previous approaches to lyrics-to-audio alignment used a pre-developed automatic speech recognition (ASR) system that innately suffered from several difficulties to adapt the speech model to individual singers.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 31 May 2016 • Yoonchang Han, Jaehun Kim, Kyogu Lee
We train our network from fixed-length music excerpts with a single-labeled predominant instrument and estimate an arbitrary number of predominant instruments from an audio signal with a variable length.
1 code implementation • 20 Aug 2015 • Juhan Nam, Jorge Herrera, Kyogu Lee
Feature learning and deep learning have drawn great attention in recent years as a way of transforming input data into more effective representations using learning algorithms.