no code implementations • 10 Apr 2024 • Taegyun Kwon, Dasaem Jeong, Juhan Nam
To this end, we propose novel architectures for convolutional recurrent neural networks, redesigning an existing autoregressive piano transcription model.
no code implementations • 24 Jan 2024 • Hounsu Kim, Soonbeom Choi, Juhan Nam
Synthesizing performing guitar sound is a highly challenging task due to the polyphony and high variability in expression.
no code implementations • 17 Jan 2024 • Yoonjin Chung, Junwon Lee, Juhan Nam
T-Foley generates high-quality audio using two conditions: the sound class and temporal event feature.
no code implementations • 17 Jan 2024 • Jiyun Park, Sangeon Yong, Taegyun Kwon, Juhan Nam
The goal of real-time lyrics alignment is to take live singing audio as input and to pinpoint the exact position within given lyrics on the fly.
1 code implementation • 16 Nov 2023 • Ilaria Manco, Benno Weck, Seungheon Doh, Minz Won, Yixiao Zhang, Dmitry Bogdanov, Yusong Wu, Ke Chen, Philip Tovstogan, Emmanouil Benetos, Elio Quinton, György Fazekas, Juhan Nam
We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models.
no code implementations • 24 Sep 2023 • Yeonghyeon Lee, Inmo Yeon, Juhan Nam, Joon Son Chung
This paper presents VoiceLDM, a model designed to produce audio that accurately follows two distinct natural language text prompts: the description prompt and the content prompt.
2 code implementations • 20 Sep 2023 • Haven Kim, Jongmin Jung, Dasaem Jeong, Juhan Nam
To broaden the scope of genres and languages in lyric translation studies, we introduce a novel singable lyric translation dataset, approximately 89\% of which consists of K-pop song lyrics.
no code implementations • 26 Aug 2023 • Haven Kim, Kento Watanabe, Masataka Goto, Juhan Nam
Lyric translation plays a pivotal role in amplifying the global resonance of music, bridging cultural divides, and fostering universal connections.
1 code implementation • 31 Jul 2023 • Taejun Kim, Juhan Nam
Music is characterized by complex hierarchical structures.
1 code implementation • 31 Jul 2023 • Seungheon Doh, Keunwoo Choi, Jongpil Lee, Juhan Nam
In addition, we trained a transformer-based music captioning model with the dataset and evaluated it under zero-shot and transfer-learning settings.
no code implementations • 25 Jun 2023 • Yuya Yamamoto, Juhan Nam, Hiroko Terasawa
Automatic detection of singing techniques from audio tracks can be beneficial to understand how each singer expresses the performance, yet it can also be difficult due to the wide variety of the singing techniques.
no code implementations • 12 Apr 2023 • Sangeon Yong, Li Su, Juhan Nam
Note-level automatic music transcription is one of the most representative music information retrieval (MIR) tasks and has been studied for various instruments to understand music.
no code implementations • 19 Mar 2023 • Seungheon Doh, Minz Won, Keunwoo Choi, Juhan Nam
We introduce a framework that recommends music based on the emotions of speech.
1 code implementation • 14 Jan 2023 • Haven Kim, Seungheon Doh, Junwon Lee, Juhan Nam
Automatically generating or captioning music playlist titles given a set of tracks is of significant interest in music streaming services as customized playlists are widely used in personalized music recommendation, and well-composed text titles attract users and help their music discovery.
3 code implementations • 26 Nov 2022 • Seungheon Doh, Minz Won, Keunwoo Choi, Juhan Nam
This paper introduces effective design choices for text-to-music retrieval systems.
1 code implementation • 14 Nov 2022 • Eunjin Choi, Yoonjin Chung, Seolhee Lee, JongIk Jeon, Taegyun Kwon, Juhan Nam
In addition, they generally lack high-level annotations such as emotion tags.
no code implementations • 31 Oct 2022 • Yuya Yamamoto, Juhan Nam, Hiroko Terasawa
In this paper, we focus on singing techniques within the scope of music information retrieval research.
1 code implementation • 25 Mar 2022 • Sangeun Kum, Jongpil Lee, Keunhyoung Luke Kim, Taehyoung Kim, Juhan Nam
We address the issue by using pseudo labels from vocal pitch estimation models given unlabeled data.
no code implementations • 13 Oct 2021 • Soonbeom Choi, Juhan Nam
We also show that the proposed model is capable of being trained with speech audio and text labels but can generate singing voice in inference time.
no code implementations • NLP4MusA 2021 • Seungheon Doh, Junwon Lee, Juhan Nam
We propose a machine-translation approach to automatically generate a playlist title from a set of music tracks.
no code implementations • 2 Oct 2020 • Taegyun Kwon, Dasaem Jeong, Juhan Nam
Recent advances in polyphonic piano transcription have been made primarily by a deliberate design of neural network architectures that detect different note states such as onset or sustain and model the temporal evolution of the states.
1 code implementation • 24 Aug 2020 • Taejun Kim, Minsuk Choi, Evan Sacks, Yi-Hsuan Yang, Juhan Nam
A DJ mix is a sequence of music tracks concatenated seamlessly, typically rendered for audiences in a live setting by a DJ on stage.
no code implementations • 9 Aug 2020 • Jongpil Lee, Nicholas J. Bryan, Justin Salamon, Zeyu Jin, Juhan Nam
For this, we (1) outline past work on the relationship between metric learning and classification, (2) extend this relationship to multi-label data by exploring three different learning approaches and their disentangled versions, and (3) evaluate all models on four tasks (training time, similarity retrieval, auto-tagging, and triplet prediction).
no code implementations • 9 Aug 2020 • Jongpil Lee, Nicholas J. Bryan, Justin Salamon, Zeyu Jin, Juhan Nam
For this task, it is typically necessary to define a similarity metric to compare one recording to another.
no code implementations • 23 Jul 2020 • Seungheon Doh, Jongpil Lee, Tae Hong Park, Juhan Nam
Word embedding pioneered by Mikolov et al. is a staple technique for word representations in natural language processing (NLP) research which has also found popularity in music information retrieval tasks.
1 code implementation • ISMIR 2019 • Dasaem Jeong, Taegyun Kwon, Yoojin Kim, Kyogu Lee, Juhan Nam
In this paper, we present our application of deep neural network to modeling piano performance, which imitates the expressive control of tempo, dynamics, articulations and pedaling from pianists.
1 code implementation • 30 Oct 2019 • Taejun Kim, Juhan Nam
End-to-end learning models using raw waveforms as input have shown superior performances in many audio recognition tasks.
1 code implementation • 5 Jul 2019 • Jeong Choi, Jongpil Lee, Jiyoung Park, Juhan Nam
Audio-based music classification and tagging is typically based on categorical supervised learning with a fixed set of labels.
no code implementations • 27 Jun 2019 • Jongpil Lee, Jiyoung Park, Juhan Nam
Supervised music representation learning has been performed mainly using semantic labels such as music genres.
1 code implementation • 26 Jun 2019 • Kyungyun Lee, Juhan Nam
We show the effectiveness of our system for singer identification and query-by-singer in both the same-domain and cross-domain tasks.
Sound Audio and Speech Processing
no code implementations • 20 Jun 2019 • Jeong Choi, Jongpil Lee, Jiyoung Park, Juhan Nam
Music classification and tagging is conducted through categorical supervised learning with a fixed set of labels.
1 code implementation • ICML 2019 • Dasaem Jeong, Taegyun Kwon, Yoojin Kim, Juhan Nam
Music score is often handled as one-dimensional sequential data.
1 code implementation • 18 Jul 2018 • Jongpil Lee, Kyungyun Lee, Jiyoung Park, Jang-Yeon Park, Juhan Nam
Recently deep learning based recommendation systems have been actively explored to solve the cold-start problem using a hybrid approach.
4 code implementations • 4 Jun 2018 • Kyungyun Lee, Keunwoo Choi, Juhan Nam
Since the vocal component plays a crucial role in popular music, singing voice detection has been an active research topic in music information retrieval.
no code implementations • 4 Dec 2017 • Jongpil Lee, Taejun Kim, Jiyoung Park, Juhan Nam
Music, speech, and acoustic scene sound are often handled separately in the audio domain because of their different signal characteristics.
2 code implementations • 28 Oct 2017 • Taejun Kim, Jongpil Lee, Juhan Nam
Recent work has shown that the end-to-end approach using convolutional neural network (CNN) is effective in various types of machine learning tasks.
2 code implementations • 18 Oct 2017 • Jiyoung Park, Jongpil Lee, Jangyeon Park, Jung-Woo Ha, Juhan Nam
In this paper, we present a supervised feature learning approach using artist labels annotated in every single track as objective meta data.
Sound Audio and Speech Processing
1 code implementation • 21 Jun 2017 • Jongpil Lee, Juhan Nam
Music tag words that describe music audio by text have different levels of abstraction.
3 code implementations • 6 Mar 2017 • Jongpil Lee, Jiyoung Park, Keunhyoung Luke Kim, Juhan Nam
Recently, the end-to-end approach that learns hierarchical representations from raw data using deep convolutional neural networks has been successfully explored in the image, text and speech domains.
1 code implementation • 6 Mar 2017 • Jongpil Lee, Juhan Nam
Second, we extract audio features from each layer of the pre-trained convolutional networks separately and aggregate them altogether given a long audio clip.
1 code implementation • 20 Aug 2015 • Juhan Nam, Jorge Herrera, Kyogu Lee
Feature learning and deep learning have drawn great attention in recent years as a way of transforming input data into more effective representations using learning algorithms.