Lip to Speech Synthesis
5 papers with code • 1 benchmarks • 2 datasets
Given a silent video of a speaker, generate the corresponding speech that matches the lip movements.
Most implemented papers
Lip-to-Speech Synthesis in the Wild with Multi-task Learning
To this end, we design multi-task learning that guides the model using multimodal supervision, i. e., text and audio, to complement the insufficient word representations of acoustic feature reconstruction loss.
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis
In this work, we explore the task of lip to speech synthesis, i. e., learning to generate natural speech given only the lip movements of a speaker.
Lip to Speech Synthesis with Visual Context Attentional GAN
In this paper, we propose a novel lip-to-speech generative adversarial network, Visual Context Attentional GAN (VCA-GAN), which can jointly model local and global lip movements during speech synthesis.
Show Me Your Face, And I'll Tell You How You Speak
When we speak, the prosody and content of the speech can be inferred from the movement of our lips.
Intelligible Lip-to-Speech Synthesis with Speech Units
Therefore, the proposed L2S model is trained to generate multiple targets, mel-spectrogram and speech units.