Tacotron

Introduced by Wang et al. in Tacotron: Towards End-to-End Speech Synthesis

Tacotron is an end-to-end generative text-to-speech model that takes a character sequence as input and outputs the corresponding spectrogram. The backbone of Tacotron is a seq2seq model with attention. The Figure depicts the model, which includes an encoder, an attention-based decoder, and a post-processing net. At a high-level, the model takes characters as input and produces spectrogram frames, which are then converted to waveforms.

Source: Tacotron: Towards End-to-End Speech Synthesis

Latest Papers

PAPER DATE
Whispered and Lombard Neural Speech Synthesis
Qiong HuTobias BleischPetko PetkovTuomo RaitioErik MarchiVarun Lakshminarasimhan
2021-01-13
Non-Attentive Tacotron: Robust and controllable neural TTS synthesis including unsupervised duration modeling
Anonymous
2021-01-01
Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech
Anonymous
2021-01-01
Parallel WaveNet conditioned on VAE latent vectors
Jonas RohnkeTom MerrittJaime Lorenzo-TruebaAdam GabrysVatsal AggarwalAlexis MoinetRoberto Barra-Chicote
2020-12-17
Using previous acoustic context to improve Text-to-Speech synthesis
Pilar Oplustil-GallegosSimon King
2020-12-07
Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement
Hamed HematiDamian Borth
2020-11-12
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. WeissRJ Skerry-RyanEric BattenbergSoroosh MariooryadDiederik P. Kingma
2020-11-06
Learning Speaker Embedding from Text-to-Speech
Jaejin ChoPiotr ZelaskoJesus VillalbaShinji WatanabeNajim Dehak
2020-10-21
Grapheme or phoneme? An Analysis of Tacotron's Embedded Representations
Antoine PerquinErica CooperJunichi Yamagishi
2020-10-21
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling
Jonathan ShenYe JiaMike ChrzanowskiYu ZhangIsaac EliasHeiga ZenYonghui Wu
2020-10-08
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
| Jaehyeon KimSungwon KimJungil KongSungroh Yoon
2020-10-01
Controllable neural text-to-speech synthesis using intuitive prosodic features
Tuomo RaitioRamya RasipuramDan Castellani
2020-09-14
Corrective feedback, emphatic speech synthesis, visual-speech exaggeration, pronunciation learning
Yaohua BuWeijun LiTianyi MaShengqi ChenJia JiaKun LiXiaobo Lu
2020-09-12
Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion
Dipjyoti PaulMuhammed PV ShifasYannis PantazisYannis Stylianou
2020-08-13
Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS
Rui LiuBerrak SismanFeilong BaoGuanglai GaoHaizhou Li
2020-08-11
SpeedySpeech: Efficient Neural Speech Synthesis
Jan VainerOndřej Dušek
2020-08-09
One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech
| Tomáš NekvindaOndřej Dušek
2020-08-03
Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
Yusuke YasudaXin WangJunichi Yamagishi
2020-05-20
End-To-End Speech Synthesis Applied to Brazilian Portuguese
| Edresson CasanovaArnaldo Candido JuniorChristopher ShulbyFrederico Santos de OliveiraJoão Paulo TeixeiraMoacir Antonelli PontiSandra Maria Aluisio
2020-05-11
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis
Junjie PanXiang YinZhiling ZhangShichao LiuYang ZhangZejun MaYuxuan Wang
2019-11-11
Speech Recognition with Augmented Synthesized Speech
Andrew RosenbergYu ZhangBhuvana RamabhadranYe JiaPedro MorenoYonghui WuZelin Wu
2019-09-25
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
| Yu ZhangRon J. WeissHeiga ZenYonghui WuZhifeng ChenRJ Skerry-RyanYe JiaAndrew RosenbergBhuvana Ramabhadran
2019-07-09
A New GAN-based End-to-End TTS Training Algorithm
Haohan GuoFrank K. SoongLei HeLei Xie
2019-04-09
Taco-VC: A Single Speaker Tacotron based Voice Conversion with Limited Data
Roee Levy LeshemRaja Giryes
2019-04-06
Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis
Yanyao BianChangbin ChenYongguo KangZhenglin Pan
2019-04-04
Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet
Mingyang ZhangXin WangFuming FangHaizhou LiJunichi Yamagishi
2019-03-29
Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language
| Yusuke YasudaXin WangShinji TakakiJunichi Yamagishi
2018-10-29
Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis
Yu-An ChungYuxuan WangWei-Ning HsuYu ZhangRJ Skerry-Ryan
2018-08-30
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis
Daisy StantonYuxuan WangRJ Skerry-Ryan
2018-08-04
Voice Imitating Text-to-Speech Neural Networks
Younggun LeeTaesu KimSoo-Young Lee
2018-06-04
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
| RJ Skerry-RyanEric BattenbergYing XiaoYuxuan WangDaisy StantonJoel ShorRon J. WeissRob ClarkRif A. Saurous
2018-03-24
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
| Yuxuan WangDaisy StantonYu ZhangRJ Skerry-RyanEric BattenbergJoel ShorYing XiaoFei RenYe JiaRif A. Saurous
2018-03-23
Emotional End-to-End Neural Speech Synthesizer
| Younggun LeeAzam RabieeSoo-Young Lee
2017-11-15
Uncovering Latent Style Factors for Expressive Speech Synthesis
Yuxuan WangRJ Skerry-RyanYing XiaoDaisy StantonJoel ShorEric BattenbergRob ClarkRif A. Saurous
2017-11-01
Deep Voice 2: Multi-Speaker Neural Text-to-Speech
Sercan ArikGregory DiamosAndrew GibianskyJohn MillerKainan PengWei PingJonathan RaimanYanqi Zhou
2017-05-24
Tacotron: Towards End-to-End Speech Synthesis
| Yuxuan WangRJ Skerry-RyanDaisy StantonYonghui WuRon J. WeissNavdeep JaitlyZongheng YangYing XiaoZhifeng ChenSamy BengioQuoc LeYannis AgiomyrgiannakisRob ClarkRif A. Saurous
2017-03-29

Categories