no code implementations • 14 Dec 2023 • Julian D. Parker, Janne Spijkervet, Katerina Kosta, Furkan Yesiler, Boris Kuznetsov, Ju-Chiang Wang, Matt Avent, Jitong Chen, Duc Le
End-to-end generation of musical audio using deep learning techniques has seen an explosion of activity recently.
no code implementations • 28 Aug 2023 • Bing Han, Junyu Dai, Weituo Hao, Xinyan He, Dong Guo, Jitong Chen, Yuxuan Wang, Yanmin Qian, Xuchen Song
We tested InstructME in instrument-editing, remixing, and multi-round editing.
1 code implementation • 19 Nov 2021 • Siyuan Shan, Lamtharn Hantrakul, Jitong Chen, Matt Avent, David Trevelyan
Differentiable Wavetable Synthesis (DWTS) is a technique for neural audio synthesis which learns a dictionary of one-period waveforms i. e. wavetables, through end-to-end training.
no code implementations • 26 Mar 2021 • Ju-Chiang Wang, Jordan B. L. Smith, Jitong Chen, Xuchen Song, Yuxuan Wang
This paper presents a novel supervised approach to detecting the chorus segments in popular music.
3 code implementations • 11 Oct 2020 • Qiuqiang Kong, Bochen Li, Jitong Chen, Yuxuan Wang
In this article, we create a GiantMIDI-Piano (GP) dataset containing 38, 700, 838 transcribed notes and 10, 855 unique solo piano works composed by 2, 786 composers.
no code implementations • 23 Apr 2020 • Yu Gu, Xiang Yin, Yonghui Rao, Yuan Wan, Benlai Tang, Yang Zhang, Jitong Chen, Yuxuan Wang, Zejun Ma
This paper presents ByteSing, a Chinese singing voice synthesis (SVS) system based on duration allocated Tacotron-like acoustic models and WaveRNN neural vocoders.
5 code implementations • ICLR 2019 • Wei Ping, Kainan Peng, Jitong Chen
In this work, we propose a new solution for parallel wave generation by WaveNet.
2 code implementations • NeurIPS 2018 • Sercan O. Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou
Speaker adaptation is based on fine-tuning a multi-speaker generative model with a few cloning samples.
no code implementations • 24 Aug 2017 • DeLiang Wang, Jitong Chen
A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data.
no code implementations • 24 Jul 2017 • Eric Battenberg, Jitong Chen, Rewon Child, Adam Coates, Yashesh Gaur, Yi Li, Hairong Liu, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao Zhu
In this work, we perform an empirical comparison among the CTC, RNN-Transducer, and attention-based Seq2Seq models for end-to-end speech recognition.