Search Results for author: Sungwon Kim

Found 18 papers, 12 papers with code

Scaling NVIDIA's Multi-speaker Multi-lingual TTS Systems with Zero-Shot TTS to Indic Languages

no code implementations24 Jan 2024 Akshit Arora, Rohan Badlani, Sungwon Kim, Rafael Valle, Bryan Catanzaro

In Track 3, we utilize P-Flow to perform zero-shot TTS by training on the challenge dataset as well as external datasets.

Voice Cloning

Interpretable Prototype-based Graph Information Bottleneck

1 code implementation NeurIPS 2023 Sangwoo Seo, Sungwon Kim, Chanyoung Park

In this work, we propose a novel framework of explainable GNNs, called interpretable Prototype-based Graph Information Bottleneck (PGIB) that incorporates prototype learning within the information bottleneck framework to provide prototypes with the key subgraph from the input graph that is important for the model prediction.

Decision Making

Density of States Prediction of Crystalline Materials via Prompt-guided Multi-Modal Transformer

1 code implementation NeurIPS 2023 Namkyeong Lee, Heewoong Noh, Sungwon Kim, Dongmin Hyun, Gyoung S. Na, Chanyoung Park

While previous works mainly focus on obtaining high-quality representations of crystalline materials for DOS prediction, we focus on predicting the DOS from the obtained representations by reflecting the nature of DOS: DOS determines the general distribution of states as a function of energy.

Unsupervised Episode Generation for Graph Meta-learning

1 code implementation27 Jun 2023 Jihyeong Jung, Sangwoo Seo, Sungwon Kim, Chanyoung Park

Despite the effectiveness of graph contrastive learning (GCL) methods in the FSNC task without using the label information, they mainly learn generic node embeddings without consideration of the downstream task to be solved, which may limit its performance in the FSNC task.

Contrastive Learning Meta-Learning +2

Task-Equivariant Graph Few-shot Learning

1 code implementation30 May 2023 Sungwon Kim, Junseok Lee, Namkyeong Lee, Wonjoong Kim, Seungyoon Choi, Chanyoung Park

To solve this problem, it is important for GNNs to be able to classify nodes with a limited number of labeled nodes, known as few-shot node classification.

Few-Shot Learning Node Classification

Conditional Graph Information Bottleneck for Molecular Relational Learning

1 code implementation29 Apr 2023 Namkyeong Lee, Dongmin Hyun, Gyoung S. Na, Sungwon Kim, Junseok Lee, Chanyoung Park

Molecular relational learning, whose goal is to learn the interaction behavior between molecular pairs, got a surge of interest in molecular sciences due to its wide range of applications.

Relational Reasoning

Predicting Density of States via Multi-modal Transformer

1 code implementation13 Mar 2023 Namkyeong Lee, Heewoong Noh, Sungwon Kim, Dongmin Hyun, Gyoung S. Na, Chanyoung Park

The density of states (DOS) is a spectral property of materials, which provides fundamental insights on various characteristics of materials.

Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data

no code implementations30 May 2022 Sungwon Kim, Heeseung Kim, Sungroh Yoon

We train the speaker-conditional diffusion model on large-scale untranscribed datasets for a classifier-free guidance method and further fine-tune the diffusion model on the reference speech of the target speaker for adaptation, which only takes 40 seconds.

Perception Prioritized Training of Diffusion Models

5 code implementations CVPR 2022 Jooyoung Choi, Jungbeom Lee, Chaehun Shin, Sungwon Kim, Hyunwoo Kim, Sungroh Yoon

Diffusion models learn to restore noisy data, which is corrupted with different levels of noise, by optimizing the weighted sum of the corresponding loss terms, i. e., denoising score matching loss.

Denoising

Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance

no code implementations23 Nov 2021 Heeseung Kim, Sungwon Kim, Sungroh Yoon

For TTS synthesis, we guide the generative process of the diffusion model with a phoneme classifier trained on a large-scale speech recognition dataset.

speech-recognition Speech Recognition +2

FICGAN: Facial Identity Controllable GAN for De-identification

no code implementations2 Oct 2021 Yonghyun Jeong, Jooyoung Choi, Sungwon Kim, Youngmin Ro, Tae-Hyun Oh, Doyeon Kim, Heonseok Ha, Sungroh Yoon

In this work, we present Facial Identity Controllable GAN (FICGAN) for not only generating high-quality de-identified face images with ensured privacy protection, but also detailed controllability on attribute preservation for enhanced data utility.

Attribute De-identification

Guided-TTS:Text-to-Speech with Untranscribed Speech

no code implementations29 Sep 2021 Heeseung Kim, Sungwon Kim, Sungroh Yoon

By modeling the unconditional distribution for speech, our model can utilize the untranscribed data for training.

Speech Synthesis Text-To-Speech Synthesis

AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate

no code implementations EMNLP 2021 Jongyoon Song, Sungwon Kim, Sungroh Yoon

Non-autoregressive neural machine translation (NART) models suffer from the multi-modality problem which causes translation inconsistency such as token repetition.

Knowledge Distillation Machine Translation +1

ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models

1 code implementation ICCV 2021 Jooyoung Choi, Sungwon Kim, Yonghyun Jeong, Youngjune Gwon, Sungroh Yoon

In this work, we propose Iterative Latent Variable Refinement (ILVR), a method to guide the generative process in DDPM to generate high-quality images based on a given reference image.

Denoising Image Generation +2

NanoFlow: Scalable Normalizing Flows with Sublinear Parameter Complexity

1 code implementation NeurIPS 2020 Sang-gil Lee, Sungwon Kim, Sungroh Yoon

Normalizing flows (NFs) have become a prominent method for deep generative models that allow for an analytic probability density estimation and efficient synthesis.

Density Estimation Normalising Flows +1

Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

5 code implementations NeurIPS 2020 Jaehyeon Kim, Sungwon Kim, Jungil Kong, Sungroh Yoon

By leveraging the properties of flows, MAS searches for the most probable monotonic alignment between text and the latent representation of speech.

Ranked #4 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Text-To-Speech Synthesis

FloWaveNet : A Generative Flow for Raw Audio

2 code implementations6 Nov 2018 Sungwon Kim, Sang-gil Lee, Jongyoon Song, Sungroh Yoon

Most of modern text-to-speech architectures use a WaveNet vocoder for synthesizing a high-fidelity waveform audio, but there has been a limitation for practical applications due to its slow autoregressive sampling scheme.

Sound Audio and Speech Processing

Cannot find the paper you are looking for? You can Submit a new open access paper.