ClariNet is an end-to-end text-to-speech architecture. Unlike previous TTS systems which use text-to-spectogram models with a separate waveform synthesizer (vocoder), ClariNet is a text-to-wave architecture that is fully convolutional and can be trained from scratch. In ClariNet, the WaveNet module is conditioned on the hidden states instead of the mel-spectogram. The architecture is otherwise based on Deep Voice 3.

Source: ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech

Latest Papers

PAPER DATE
Learning from a Complementary-label Source Domain: Theory and Algorithms
Yiyang ZhangFeng LiuZhen FangBo YuanGuangquan ZhangJie Lu
2020-08-04
Clarinet: A One-step Approach Towards Budget-friendly Unsupervised Domain Adaptation
| Yiyang ZhangFeng LiuZhen FangBo YuanGuangquan ZhangJie Lu
2020-07-29
CLARINET: A RISC-V Based Framework for Posit Arithmetic Empiricism
Riya JainNiraj SharmaFarhad MerchantSachin PatkarRainer Leupers
2020-05-30
Multi-Speaker End-to-End Speech Synthesis
Jihyun ParkKexin ZhaoKainan PengWei Ping
2019-07-09
Non-Autoregressive Neural Text-to-Speech
| Kainan PengWei PingZhao SongKexin Zhao
2019-05-21
Neural source-filter waveform models for statistical parametric speech synthesis
Xin WangShinji TakakiJunichi Yamagishi
2019-04-27
FloWaveNet : A Generative Flow for Raw Audio
Sungwon KimSang-gil LeeJongyoon SongSungroh Yoon
2018-11-06
ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech
| Wei PingKainan PengJitong Chen
2018-07-19

Tasks

TASK PAPERS SHARE
Speech Synthesis 3 37.50%
Domain Adaptation 2 25.00%
Unsupervised Domain Adaptation 2 25.00%
Text-To-Speech Synthesis 1 12.50%

Categories