Talking Face Generation

37 papers with code • 2 benchmarks • 6 datasets

Talking face generation aims to synthesize a sequence of face images that correspond to given speech semantics

( Image credit: Talking Face Generation by Adversarially Disentangled Audio-Visual Representation )

Benchmarks

Add a Result

These leaderboards are used to track progress in Talking Face Generation

Trend	Dataset	Best Model	Paper	Code	Compare
	LRW	LipGAN			See all
	CREMA-D	EmoGen			See all

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation

Hangz-nju-cuhk/Talking-Face_PC-AVS • • CVPR 2021

While speech content information can be defined by learning the intrinsic synchronization between audio-visual modalities, we identify that a pose code will be complementarily learned in a modulated convolution-based reconstruction framework.

Paper
Code

Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme-Pose Dictionary

sibozhang/Text2Video • 29 Apr 2021

With the advance of deep learning technology, automatic video generation from audio or text has become an emerging and promising research topic.

Paper
Code

Flow-Guided One-Shot Talking Face Generation With a High-Resolution Audio-Visual Dataset

MRzzm/HDTF • • CVPR 2021

To synthesize high-definition videos, we build a large in-the-wild high-resolution audio-visual dataset and propose a novel flow-guided talking face generation framework.

Paper
Code

Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text

tpulkit/txt2vid • • 26 Jun 2021

Video represents the majority of internet traffic today, driving a continual race between the generation of higher quality content, transmission of larger file sizes, and the development of network infrastructure.

Paper
Code

Parallel and High-Fidelity Text-to-Lip Generation

Dianezzy/ParaLip • • 14 Jul 2021

However, the AR decoding manner generates current lip frame conditioned on frames generated previously, which inherently hinders the inference speed, and also has a detrimental effect on the quality of generated lip frames due to error propagation.

Paper
Code

FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning

zhangchenxu528/FACIAL • • ICCV 2021

In this paper, we propose a talking face generation method that takes an audio signal as input and a short target video clip as reference, and synthesizes a photo-realistic video of the target face with natural lip motions, head poses, and eye blinks that are in-sync with the input audio signal.

Paper
Code

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation

YuanxunLu/LiveSpeechPortraits • • 22 Sep 2021

The first stage is a deep neural network that extracts deep audio features along with a manifold projection to project the features to the target person's speech space.

Paper
Code

StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN

FeiiYin/StyleHEAT • • 8 Mar 2022

Our framework elevates the resolution of the synthesized talking face to 1024*1024 for the first time, even though the training dataset has a lower resolution.

Paper
Code

Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts

deeplsd/merkel-podcast-corpus • 24 May 2022

We introduce the Merkel Podcast Corpus, an audio-visual-text corpus in German collected from 16 years of (almost) weekly Internet podcasts of former German chancellor Angela Merkel.

Paper
Code

Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis

sstzal/DFRF • • 24 Jul 2022

Thus the facial radiance field can be flexibly adjusted to the new identity with few reference images.

Paper
Code

Talking Face Generation

Benchmarks Add a Result

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result