Talking Face Generation
37 papers with code • 2 benchmarks • 6 datasets
Talking face generation aims to synthesize a sequence of face images that correspond to given speech semantics
( Image credit: Talking Face Generation by Adversarially Disentangled Audio-Visual Representation )
Most implemented papers
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation
While speech content information can be defined by learning the intrinsic synchronization between audio-visual modalities, we identify that a pose code will be complementarily learned in a modulated convolution-based reconstruction framework.
Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary
With the advance of deep learning technology, automatic video generation from audio or text has become an emerging and promising research topic.
Flow-Guided One-Shot Talking Face Generation With a High-Resolution Audio-Visual Dataset
To synthesize high-definition videos, we build a large in-the-wild high-resolution audio-visual dataset and propose a novel flow-guided talking face generation framework.
Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text
Video represents the majority of internet traffic today, driving a continual race between the generation of higher quality content, transmission of larger file sizes, and the development of network infrastructure.
Parallel and High-Fidelity Text-to-Lip Generation
However, the AR decoding manner generates current lip frame conditioned on frames generated previously, which inherently hinders the inference speed, and also has a detrimental effect on the quality of generated lip frames due to error propagation.
FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning
In this paper, we propose a talking face generation method that takes an audio signal as input and a short target video clip as reference, and synthesizes a photo-realistic video of the target face with natural lip motions, head poses, and eye blinks that are in-sync with the input audio signal.
Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation
The first stage is a deep neural network that extracts deep audio features along with a manifold projection to project the features to the target person's speech space.
StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN
Our framework elevates the resolution of the synthesized talking face to 1024*1024 for the first time, even though the training dataset has a lower resolution.
Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts
We introduce the Merkel Podcast Corpus, an audio-visual-text corpus in German collected from 16 years of (almost) weekly Internet podcasts of former German chancellor Angela Merkel.
Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis
Thus the facial radiance field can be flexibly adjusted to the new identity with few reference images.