Search Results for author: Kyogu Lee

Found 38 papers, 18 papers with code

Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling

no code implementations1 Apr 2024 Injune Hwang, Kyogu Lee

Recently, there have been efforts to encode the linguistic information of speech using a self-supervised framework for speech synthesis.

Speaker Identification Speech Synthesis

Learning Semantic Information from Raw Audio Signal Using Both Contextual and Phonetic Representations

no code implementations2 Feb 2024 Jaeyeon Kim, Injune Hwang, Kyogu Lee

We propose a framework to learn semantics from raw audio signals using two types of representations, encoding contextual and phonetic information respectively.

Language Modelling Spoken Language Understanding

DDD: A Perceptually Superior Low-Response-Time DNN-based Declipper

1 code implementation8 Jan 2024 Jayeon Yi, Junghyun Koo, Kyogu Lee

Clipping is a common nonlinear distortion that occurs whenever the input or output of an audio system exceeds the supported range.

Inverse Nonlinearity Compensation of Hyperelastic Deformation in Dielectric Elastomer for Acoustic Actuation

no code implementations8 Jan 2024 Jin Woo Lee, Gwang Seok An, Jeong-Yun Sun, Kyogu Lee

This paper delves into the analysis of nonlinear deformation induced by dielectric actuation in pre-stressed ideal dielectric elastomers.

Numerical Integration

Combinatorial music generation model with song structure graph analysis

1 code implementation24 Dec 2023 SeongHyeon Go, Kyogu Lee

In this work, we propose a symbolic music generation model with the song structure graph analysis network.

Music Classification Music Generation

Beat-Aligned Spectrogram-to-Sequence Generation of Rhythm-Game Charts

no code implementations22 Nov 2023 Jayeon Yi, Sungho Lee, Kyogu Lee

In the heart of "rhythm games" - games where players must perform actions in sync with a piece of music - are "charts", the directives to be given to players.

Exploiting Time-Frequency Conformers for Music Audio Enhancement

no code implementations24 Aug 2023 Yunkee Chae, Junghyun Koo, Sungho Lee, Kyogu Lee

With the proliferation of video platforms on the internet, recording musical performances by mobile devices has become commonplace.

Speech Enhancement

Self-refining of Pseudo Labels for Music Source Separation with Noisy Labeled Data

no code implementations24 Jul 2023 Junghyun Koo, Yunkee Chae, Chang-Bin Jeon, Kyogu Lee

Music source separation (MSS) faces challenges due to the limited availability of correctly-labeled individual instrument tracks.

Instrument Recognition Music Source Separation

Debiased Automatic Speech Recognition for Dysarthric Speech via Sample Reweighting with Sample Affinity Test

no code implementations22 May 2023 Eungbeom Kim, Yunkee Chae, Jaeheon Sim, Kyogu Lee

Since ERM utilizes the averaged performance on the data samples regardless of a group such as healthy or dysarthric speakers, ASR systems are unaware of the performance disparities across the groups.

Automatic Speech Recognition speech-recognition +1

Show Me the Instruments: Musical Instrument Retrieval from Mixture Audio

1 code implementation15 Nov 2022 KyungSu Kim, Minju Park, Haesun Joung, Yunkee Chae, Yeongbeom Hong, SeongHyeon Go, Kyogu Lee

The Single-Instrument Encoder is trained to classify the instruments used in single-track audio, and we take its penultimate layer's activation as the instrument embedding.

Retrieval

MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation

1 code implementation14 Nov 2022 Chang-Bin Jeon, Hyeongi Moon, Keunwoo Choi, Ben Sangbae Chon, Kyogu Lee

Second, to overcome the absence of existing multi-singing datasets for a training purpose, we present a strategy for construction of multiple singing mixtures using various single-singing datasets.

Music Source Separation Super-Resolution

Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations

no code implementations11 Nov 2022 Yoori Oh, Juheon Lee, Yoseob Han, Kyogu Lee

However, the emotional latent space generated from the existing models is difficult to control the continuous emotional intensity because of the entanglement of features like emotions, speakers, etc.

Emotional Speech Synthesis

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects

1 code implementation4 Nov 2022 Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich, Kyogu Lee, Yuki Mitsufuji

We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song.

Contrastive Learning Disentanglement +2

Pop2Piano : Pop Audio-based Piano Cover Generation

1 code implementation2 Nov 2022 Jongho Choi, Kyogu Lee

Piano covers of pop music are enjoyed by many people.

Neural Fourier Shift for Binaural Speech Rendering

no code implementations2 Nov 2022 Jin Woo Lee, Kyogu Lee

We present a neural network for rendering binaural speech from given monaural audio, position, and orientation of the source.

Exploring Train and Test-Time Augmentations for Audio-Language Learning

no code implementations31 Oct 2022 Eungbeom Kim, Jinhee Kim, Yoori Oh, KyungSu Kim, Minju Park, Jaeheon Sim, Jinwoo Lee, Kyogu Lee

In this paper, we aim to unveil the impact of data augmentation in audio-language multi-modal learning, which has not been explored despite its importance.

Audio captioning Audio to Text Retrieval +4

Global HRTF Interpolation via Learned Affine Transformation of Hyper-conditioned Features

1 code implementation6 Apr 2022 Jin Woo Lee, Sungho Lee, Kyogu Lee

Especially for the data-driven approaches, existing HRTF datasets differ in spatial sampling distributions of source positions, posing a major problem when generalizing the method across multiple datasets.

Position

End-to-end Music Remastering System Using Self-supervised and Adversarial Training

1 code implementation17 Feb 2022 Junghyun Koo, Seungryeol Paik, Kyogu Lee

Mastering is an essential step in music production, but it is also a challenging task that has to go through the hands of experienced audio engineers, where they adjust tone, space, and volume of a song.

Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

2 code implementations NeurIPS 2021 Hyeong-Seok Choi, Juheon Lee, Wansoo Kim, Jie Hwan Lee, Hoon Heo, Kyogu Lee

We present a neural analysis and synthesis (NANSY) framework that can manipulate voice, pitch, and speed of an arbitrary speech signal.

Voice Conversion

Reverb Conversion of Mixed Vocal Tracks Using an End-to-end Convolutional Deep Neural Network

no code implementations3 Mar 2021 Junghyun Koo, Seungryeol Paik, Kyogu Lee

This method enables us to apply the reverb of the reference track to the source track to which the effect is desired.

Real-time Denoising and Dereverberation with Tiny Recurrent U-Net

1 code implementation5 Feb 2021 Hyeong-Seok Choi, Sungjin Park, Jie Hwan Lee, Hoon Heo, Dongsuk Jeon, Kyogu Lee

Modern deep learning-based models have seen outstanding performance improvement with speech enhancement tasks.

Denoising Speech Enhancement

Exploiting Multi-Modal Features From Pre-trained Networks for Alzheimer's Dementia Recognition

no code implementations9 Sep 2020 Junghyun Koo, Jie Hwan Lee, Jaewoo Pyo, Yujin Jo, Kyogu Lee

In this work, we exploit various multi-modal features extracted from pre-trained networks to recognize Alzheimer's Dementia using a neural network, with a small dataset provided by the ADReSS Challenge at INTERSPEECH 2020.

regression

From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from Speech

no code implementations ICLR 2020 Hyeong-Seok Choi, Changdae Park, Kyogu Lee

We analyze the extent to which the network can naturally disentangle two latent factors that contribute to the generation of a face image - one that comes directly from a speech signal and the other that is not related to it - and explore whether the network can learn to generate natural human face image distribution by modeling these factors.

VirtuosoNet: A Hierarchical RNN-based System for Modeling Expressive Piano Performance

1 code implementation ISMIR 2019 Dasaem Jeong, Taegyun Kwon, Yoojin Kim, Kyogu Lee, Juhan Nam

In this paper, we present our application of deep neural network to modeling piano performance, which imitates the expressive control of tempo, dynamics, articulations and pedaling from pianists.

Music Performance Rendering

Disentangling Timbre and Singing Style with Multi-singer Singing Synthesis System

no code implementations29 Oct 2019 Juheon Lee, Hyeong-Seok Choi, Junghyun Koo, Kyogu Lee

In this study, we define the identity of the singer with two independent concepts - timbre and singing style - and propose a multi-singer singing synthesis system that can model them separately.

Sound Audio and Speech Processing

Adversarially Trained End-to-end Korean Singing Voice Synthesis System

no code implementations6 Aug 2019 Juheon Lee, Hyeong-Seok Choi, Chang-Bin Jeon, Junghyun Koo, Kyogu Lee

In this paper, we propose an end-to-end Korean singing voice synthesis system from lyrics and a symbolic melody using the following three novel approaches: 1) phonetic enhancement masking, 2) local conditioning of text and pitch to the super-resolution network, and 3) conditional adversarial training.

Sound Audio and Speech Processing

Phase-aware Speech Enhancement with Deep Complex U-Net

7 code implementations ICLR 2019 Hyeong-Seok Choi, Jang-Hyun Kim, Jaesung Huh, Adrian Kim, Jung-Woo Ha, Kyogu Lee

Most deep learning-based models for speech enhancement have mainly focused on estimating the magnitude of spectrogram while reusing the phase from noisy speech for reconstruction.

Speech Enhancement valid

Sequential Skip Prediction with Few-shot in Streamed Music Contents

1 code implementation24 Jan 2019 Sungkyun Chang, Seungjin Lee, Kyogu Lee

This paper provides an outline of the algorithms submitted for the WSDM Cup 2019 Spotify Sequential Skip Prediction Challenge (team name: mimbres).

Few-Shot Learning Metric Learning +1

Music Source Separation Using Stacked Hourglass Networks

4 code implementations22 May 2018 Sungheon Park, Tae-hoon Kim, Kyogu Lee, Nojun Kwak

In this paper, we propose a simple yet effective method for multiple music source separation using convolutional neural networks.

Sound Audio and Speech Processing

Chord Generation from Symbolic Melody Using BLSTM Networks

2 code implementations4 Dec 2017 Hyungui Lim, Seungyeon Rhyu, Kyogu Lee

Generating a chord progression from a monophonic melody is a challenging problem because a chord progression requires a series of layered notes played simultaneously.

Audio Cover Song Identification using Convolutional Neural Network

no code implementations1 Dec 2017 Sungkyun Chang, Juheon Lee, Sang Keun Choe, Kyogu Lee

To do this, we first build the CNN using as an input a cross-similarity matrix generated from a pair of songs.

Cover song identification Relation

Lyrics-to-Audio Alignment by Unsupervised Discovery of Repetitive Patterns in Vowel Acoustics

no code implementations21 Jan 2017 Sungkyun Chang, Kyogu Lee

Most of the previous approaches to lyrics-to-audio alignment used a pre-developed automatic speech recognition (ASR) system that innately suffered from several difficulties to adapt the speech model to individual singers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Deep convolutional neural networks for predominant instrument recognition in polyphonic music

1 code implementation31 May 2016 Yoonchang Han, Jaehun Kim, Kyogu Lee

We train our network from fixed-length music excerpts with a single-labeled predominant instrument and estimate an arbitrary number of predominant instruments from an audio signal with a variable length.

Information Retrieval Instrument Recognition +3

A Deep Bag-of-Features Model for Music Auto-Tagging

1 code implementation20 Aug 2015 Juhan Nam, Jorge Herrera, Kyogu Lee

Feature learning and deep learning have drawn great attention in recent years as a way of transforming input data into more effective representations using learning algorithms.

Audio Classification Information Retrieval +4

Cannot find the paper you are looking for? You can Submit a new open access paper.