no code implementations • 11 Nov 2022 • Yoori Oh, Juheon Lee, Yoseob Han, Kyogu Lee
However, the emotional latent space generated from the existing models is difficult to control the continuous emotional intensity because of the entanglement of features like emotions, speakers, etc.
no code implementations • 31 Oct 2022 • Eungbeom Kim, Jinhee Kim, Yoori Oh, KyungSu Kim, Minju Park, Jaeheon Sim, Jinwoo Lee, Kyogu Lee
In this paper, we aim to unveil the impact of data augmentation in audio-language multi-modal learning, which has not been explored despite its importance.
Ranked #2 on Audio to Text Retrieval on AudioCaps