Unconstrained Lip-synchronization

3 papers with code • 3 benchmarks • 2 datasets

Given a video of an arbitrary person, and an arbitrary driving speech, the task is to generate a lip-synced video that matches the given speech.

This task requires the approach to not be constrained by identity, voice, or language.

Datasets


Greatest papers with code

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

Rudrabha/Wav2Lip 23 Aug 2020

However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio.

Unconstrained Lip-synchronization

Towards Automatic Face-to-Face Translation

Rudrabha/LipGAN ACM Multimedia, 2019 2019

As today's digital communication becomes increasingly visual, we argue that there is a need for systems that can automatically translate a video of a person speaking in language A into a target language B with realistic lip synchronization.

 Ranked #1 on Talking Face Generation on LRW (using extra training data)

Face to Face Translation Machine Translation +1

You said that?

joonson/yousaidthat 8 May 2017

To achieve this we propose an encoder-decoder CNN model that uses a joint embedding of the face and audio to generate synthesised talking face video frames.

Unconstrained Lip-synchronization