no code implementations • 21 Mar 2022 • Zewang Zhang, Yibin Zheng, Xinhui Li, Li Lu
To improve the accuracy and naturalness of synthesized singing voice, we design several specifical modules and techniques: 1) A deep bi-directional LSTM-based duration model with multi-scale rhythm loss and post-processing step; 2) A Transformer-alike acoustic model with progressive pitch-weighted decoder loss; 3) a 24 kHz pitch-aware LPCNet neural vocoder to produce high-quality singing waveforms; 4) A novel data augmentation method with multi-singer pre-training for stronger robustness and naturalness.
1 code implementation • 24 Nov 2020 • Qiao Tian, Yi Chen, Zewang Zhang, Heng Lu, LingHui Chen, Lei Xie, Shan Liu
On one hand, we propose to discriminate ground-truth waveform from synthetic one in frequency domain for offering more consistency guarantees instead of only in time domain.
no code implementations • 12 May 2020 • Zewang Zhang, Qiao Tian, Heng Lu, Ling-Hui Chen, Shan Liu
This paper investigates how to leverage a DurIAN-based average model to enable a new speaker to have both accurate pronunciation and fluent cross-lingual speaking with very limited monolingual data.
no code implementations • 22 Nov 2016 • Zewang Zhang, Zheng Sun, Jiaqi Liu, Jingwen Chen, Zhao Huo, Xiao Zhang
We further show that applying deep residual learning can boost the convergence speed of our novel deep recurret convolutional networks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 16 Nov 2016 • Zheng Sun, Jiaqi Liu, Zewang Zhang, Jingwen Chen, Zhao Huo, Ching Hua Lee, Xiao Zhang
Creating aesthetically pleasing pieces of art, including music, has been a long-term goal for artificial intelligence research.