Talking Face Generation

37 papers with code • 2 benchmarks • 6 datasets

Talking face generation aims to synthesize a sequence of face images that correspond to given speech semantics

( Image credit: Talking Face Generation by Adversarially Disentangled Audio-Visual Representation )

Most implemented papers

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation

Hangz-nju-cuhk/Talking-Face_PC-AVS CVPR 2021

While speech content information can be defined by learning the intrinsic synchronization between audio-visual modalities, we identify that a pose code will be complementarily learned in a modulated convolution-based reconstruction framework.

Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary

sibozhang/Text2Video 29 Apr 2021

With the advance of deep learning technology, automatic video generation from audio or text has become an emerging and promising research topic.

Flow-Guided One-Shot Talking Face Generation With a High-Resolution Audio-Visual Dataset

MRzzm/HDTF CVPR 2021

To synthesize high-definition videos, we build a large in-the-wild high-resolution audio-visual dataset and propose a novel flow-guided talking face generation framework.

Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text

tpulkit/txt2vid 26 Jun 2021

Video represents the majority of internet traffic today, driving a continual race between the generation of higher quality content, transmission of larger file sizes, and the development of network infrastructure.

Parallel and High-Fidelity Text-to-Lip Generation

Dianezzy/ParaLip 14 Jul 2021

However, the AR decoding manner generates current lip frame conditioned on frames generated previously, which inherently hinders the inference speed, and also has a detrimental effect on the quality of generated lip frames due to error propagation.

FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning

zhangchenxu528/FACIAL ICCV 2021

In this paper, we propose a talking face generation method that takes an audio signal as input and a short target video clip as reference, and synthesizes a photo-realistic video of the target face with natural lip motions, head poses, and eye blinks that are in-sync with the input audio signal.

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation

YuanxunLu/LiveSpeechPortraits 22 Sep 2021

The first stage is a deep neural network that extracts deep audio features along with a manifold projection to project the features to the target person's speech space.

StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN

FeiiYin/StyleHEAT 8 Mar 2022

Our framework elevates the resolution of the synthesized talking face to 1024*1024 for the first time, even though the training dataset has a lower resolution.

Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts

deeplsd/merkel-podcast-corpus 24 May 2022

We introduce the Merkel Podcast Corpus, an audio-visual-text corpus in German collected from 16 years of (almost) weekly Internet podcasts of former German chancellor Angela Merkel.

Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis

sstzal/DFRF 24 Jul 2022

Thus the facial radiance field can be flexibly adjusted to the new identity with few reference images.