Lip to Speech Synthesis

5 papers with code • 1 benchmarks • 2 datasets

Given a silent video of a speaker, generate the corresponding speech that matches the lip movements.

Datasets


Most implemented papers

Lip-to-Speech Synthesis in the Wild with Multi-task Learning

ms-dot-k/Lip-to-Speech-Synthesis-in-the-Wild 17 Feb 2023

To this end, we design multi-task learning that guides the model using multimodal supervision, i. e., text and audio, to complement the insufficient word representations of acoustic feature reconstruction loss.

Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis

Rudrabha/Lip2Wav CVPR 2020

In this work, we explore the task of lip to speech synthesis, i. e., learning to generate natural speech given only the lip movements of a speaker.

Lip to Speech Synthesis with Visual Context Attentional GAN

ms-dot-k/Visual-Context-Attentional-GAN NeurIPS 2021

In this paper, we propose a novel lip-to-speech generative adversarial network, Visual Context Attentional GAN (VCA-GAN), which can jointly model local and global lip movements during speech synthesis.

Show Me Your Face, And I'll Tell You How You Speak

chris10m/lip2speech 28 Jun 2022

When we speak, the prosody and content of the speech can be inferred from the movement of our lips.

Intelligible Lip-to-Speech Synthesis with Speech Units

choijeongsoo/lip2speech-unit 31 May 2023

Therefore, the proposed L2S model is trained to generate multiple targets, mel-spectrogram and speech units.