10 papers with code • 1 benchmarks • 2 datasets
Talking face generation aims to synthesize a sequence of face images that correspond to given speech semantics.
Specifically, we design an end-to-end talking face generation system that takes a speech utterance, a single face image, and a categorical emotion label as input to render a talking face video in sync with the speech and expressing the condition emotion.
The necessary attributes of having a realistic face animation are 1) audio-visual synchronization (2) identity preservation of the target individual (3) plausible mouth movements (4) presence of natural eye blinks.
Talking face generation aims to synthesize a face video with precise lip synchronization as well as a smooth transition of facial motion over the entire video via the given speech clip and facial image.