9 papers with code • 7 benchmarks • 2 datasets
Talking head generation is the task of generating a talking face from a set of images of a person.
( Image credit: Few-Shot Adversarial Learning of Realistic Neural Talking Head Models )
However, they fail to accurately morph the lip movements of arbitrary identities in dynamic, unconstrained talking face videos, resulting in significant parts of the video being out-of-sync with the new audio.
Ranked #1 on Unconstrained Lip-synchronization on LRW
In order to create a personalized talking head model, these works require training on a large dataset of images of a single person.
The texture image is generated offline, warped and added to the coarse image to ensure a high effective resolution of synthesized head views.
In this work, we present a carefully-designed benchmark for evaluating talking-head video generation with standardized dataset pre-processing strategies.
We also prove that the 3D VHIP with Fixed CoP is the same as its 2D version, and we generalize controllers working on the 2D VHIP to the 3D VHIP.