no code implementations • 3 Aug 2020 • Ahmed Hussen Abdelaziz, Anushree Prasanna Kumar, Chloe Seivwright, Gabriele Fanelli, Justin Binder, Yannis Stylianou, Sachin Kajarekar
The output acoustic features are used to condition a WaveRNN to reconstruct the speech waveform, and the output facial controllers are used to generate the corresponding video of the talking face.
no code implementations • 15 May 2019 • Ahmed Hussen Abdelaziz, Barry-John Theobald, Justin Binder, Gabriele Fanelli, Paul Dixon, Nicholas Apostoloff, Thibaut Weise, Sachin Kajareker
We conclude that visual speech synthesis can significantly benefit from the powerful representation of speech in the ASR acoustic models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3