3 papers with code • 3 benchmarks • 2 datasets
Given a video of an arbitrary person, and an arbitrary driving speech, the task is to generate a lip-synced video that matches the given speech.
This task requires the approach to not be constrained by identity, voice, or language.
However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio.
Ranked #1 on Unconstrained Lip-synchronization on LRW
As today's digital communication becomes increasingly visual, we argue that there is a need for systems that can automatically translate a video of a person speaking in language A into a target language B with realistic lip synchronization.
Ranked #1 on Talking Face Generation on LRW (using extra training data)