Talking Face Generation
37 papers with code • 2 benchmarks • 6 datasets
Talking face generation aims to synthesize a sequence of face images that correspond to given speech semantics
( Image credit: Talking Face Generation by Adversarially Disentangled Audio-Visual Representation )
Latest papers with no code
FT2TF: First-Person Statement Text-To-Talking Face Generation
This achievement highlights our model capability to bridge first-person statements and dynamic face generation, providing insightful guidance for future work.
CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding
This paper proposes a talking face generation method named "CP-EB" that takes an audio signal as input and a person image as reference, to synthesize a photo-realistic people talking video with head poses controlled by a short video clip and proper eye blinking embedding.
ToonTalker: Cross-Domain Face Reenactment
Moreover, since no paired data is provided, we propose a novel cross-domain training scheme using data from two domains with the designed analogy constraint.
VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
With our essential designs on facial style learning, our model is able to flexibly capture the expressive facial style from arbitrary video prompts and transfer it onto a personalized image renderer in a zero-shot manner.
Audio-driven Talking Face Generation by Overcoming Unintended Information Flow
Specifically, this involves unintended flow of lip, pose and other information from the reference to the generated image, as well as instabilities during model training.
FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction
DeepFake based digital facial forgery is threatening public media security, especially when lip manipulation has been used in talking face generation, and the difficulty of fake video detection is further improved.
Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions
Given a short speech video, we first build an efficient talking radiance field, and then apply the latest conditional diffusion model for image editing based on the given instructions and guiding implicit representation optimization towards the editing target.
Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation
The contextualized lip motion unit then guides the latter in synthesizing a target identity with context-aware lip motion.
CPNet: Exploiting CLIP-based Attention Condenser and Probability Map Guidance for High-fidelity Talking Face Generation
Recently, talking face generation has drawn ever-increasing attention from the research community in computer vision due to its arduous challenges and widespread application scenarios, e. g. movie animation and virtual anchor.
Multimodal-driven Talking Face Generation via a Unified Diffusion-based Generator
More specifically, given a textured face as the source and the rendered face projected from the desired 3DMM coefficients as the target, our proposed Texture-Geometry-aware Diffusion Model decomposes the complex transfer problem into multi-conditional denoising process, where a Texture Attention-based module accurately models the correspondences between appearance and geometry cues contained in source and target conditions, and incorporate extra implicit information for high-fidelity talking face generation.