2 code implementations • NeurIPS 2021 • Hyeong-Seok Choi, Juheon Lee, Wansoo Kim, Jie Hwan Lee, Hoon Heo, Kyogu Lee
We present a neural analysis and synthesis (NANSY) framework that can manipulate voice, pitch, and speed of an arbitrary speech signal.
1 code implementation • 5 Feb 2021 • Hyeong-Seok Choi, Sungjin Park, Jie Hwan Lee, Hoon Heo, Dongsuk Jeon, Kyogu Lee
Modern deep learning-based models have seen outstanding performance improvement with speech enhancement tasks.
no code implementations • ICLR 2020 • Hyeong-Seok Choi, Changdae Park, Kyogu Lee
We analyze the extent to which the network can naturally disentangle two latent factors that contribute to the generation of a face image - one that comes directly from a speech signal and the other that is not related to it - and explore whether the network can learn to generate natural human face image distribution by modeling these factors.
no code implementations • 29 Oct 2019 • Juheon Lee, Hyeong-Seok Choi, Junghyun Koo, Kyogu Lee
In this study, we define the identity of the singer with two independent concepts - timbre and singing style - and propose a multi-singer singing synthesis system that can model them separately.
Sound Audio and Speech Processing
no code implementations • 6 Aug 2019 • Juheon Lee, Hyeong-Seok Choi, Chang-Bin Jeon, Junghyun Koo, Kyogu Lee
In this paper, we propose an end-to-end Korean singing voice synthesis system from lyrics and a symbolic melody using the following three novel approaches: 1) phonetic enhancement masking, 2) local conditioning of text and pitch to the super-resolution network, and 3) conditional adversarial training.
Sound Audio and Speech Processing
7 code implementations • ICLR 2019 • Hyeong-Seok Choi, Jang-Hyun Kim, Jaesung Huh, Adrian Kim, Jung-Woo Ha, Kyogu Lee
Most deep learning-based models for speech enhancement have mainly focused on estimating the magnitude of spectrogram while reusing the phase from noisy speech for reconstruction.