Talking Face Generation

37 papers with code • 2 benchmarks • 6 datasets

Talking face generation aims to synthesize a sequence of face images that correspond to given speech semantics

( Image credit: Talking Face Generation by Adversarially Disentangled Audio-Visual Representation )

Latest papers with no code

FT2TF: First-Person Statement Text-To-Talking Face Generation

no code yet • 9 Dec 2023

This achievement highlights our model capability to bridge first-person statements and dynamic face generation, providing insightful guidance for future work.

CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding

no code yet • 15 Nov 2023

This paper proposes a talking face generation method named "CP-EB" that takes an audio signal as input and a person image as reference, to synthesize a photo-realistic people talking video with head poses controlled by a short video clip and proper eye blinking embedding.

ToonTalker: Cross-Domain Face Reenactment

no code yet • ICCV 2023

Moreover, since no paired data is provided, we propose a novel cross-domain training scheme using data from two domains with the designed analogy constraint.

VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer

no code yet • 9 Aug 2023

With our essential designs on facial style learning, our model is able to flexibly capture the expressive facial style from arbitrary video prompts and transfer it onto a personalized image renderer in a zero-shot manner.

Audio-driven Talking Face Generation by Overcoming Unintended Information Flow

no code yet • 18 Jul 2023

Specifically, this involves unintended flow of lip, pose and other information from the reference to the generated image, as well as instabilities during model training.

FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction

no code yet • 8 Jul 2023

DeepFake based digital facial forgery is threatening public media security, especially when lip manipulation has been used in talking face generation, and the difficulty of fake video detection is further improved.

Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions

no code yet • 19 Jun 2023

Given a short speech video, we first build an efficient talking radiance field, and then apply the latest conditional diffusion model for image editing based on the given instructions and guiding implicit representation optimization towards the editing target.

Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation

no code yet • 31 May 2023

The contextualized lip motion unit then guides the latter in synthesizing a target identity with context-aware lip motion.

CPNet: Exploiting CLIP-based Attention Condenser and Probability Map Guidance for High-fidelity Talking Face Generation

no code yet • 23 May 2023

Recently, talking face generation has drawn ever-increasing attention from the research community in computer vision due to its arduous challenges and widespread application scenarios, e. g. movie animation and virtual anchor.

Multimodal-driven Talking Face Generation via a Unified Diffusion-based Generator

no code yet • 4 May 2023

More specifically, given a textured face as the source and the rendered face projected from the desired 3DMM coefficients as the target, our proposed Texture-Geometry-aware Diffusion Model decomposes the complex transfer problem into multi-conditional denoising process, where a Texture Attention-based module accurately models the correspondences between appearance and geometry cues contained in source and target conditions, and incorporate extra implicit information for high-fidelity talking face generation.