no code implementations • 27 Feb 2023 • Yoonhyung Lee, Jinhyeok Yang, Kyomin Jung
Also, the objective function of NF makes the model use the variance information and the text in a disentangled manner resulting in more precise variance control.
no code implementations • 26 Jul 2022 • Yoonhyung Lee, Seunghyun Yoon, Kyomin Jung
Then, the attention weights of each modality are applied directly to the other modality in a crossed way, so that the CAN gathers the audio and text information from the same time steps based on each modality.
no code implementations • 25 Jun 2022 • Yoonhyung Lee, Sungdong Lee, Joong-Ho Won
In this paper, we conduct an in-depth analysis of the two modes of ISGD for smooth convex functions, namely proximal Robbins-Monro (proxRM) and proximal Poylak-Ruppert (proxPR) procedures, for their use in statistical inference on model parameters.
1 code implementation • ICLR 2021 • Yoonhyung Lee, Joongbo Shin, Kyomin Jung
Although early text-to-speech (TTS) models such as Tacotron 2 have succeeded in generating human-like speech, their autoregressive (AR) architectures have a limitation that they require a lot of time to generate a mel-spectrogram consisting of hundreds of steps.
1 code implementation • ACL 2020 • Joongbo Shin, Yoonhyung Lee, Seunghyun Yoon, Kyomin Jung
Even though BERT achieves successful performance improvements in various supervised learning tasks, applying BERT for unsupervised tasks still holds a limitation that it requires repetitive inference for computing contextual language representations.
no code implementations • SEMEVAL 2019 • Yoonhyung Lee, Yanghoon Kim, Kyomin Jung
This paper describes our system for SemEval-2019 Task 3: EmoContext, which aims to predict the emotion of the third utterance considering two preceding utterances in a dialogue.
no code implementations • 16 May 2019 • Joongbo Shin, Yoonhyung Lee, Kyomin Jung
Recent studies have tried to use bidirectional LMs (biLMs) instead of conventional unidirectional LMs (uniLMs) for rescoring the $N$-best list decoded from the acoustic model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3