3D Face Animation
21 papers with code • 3 benchmarks • 6 datasets
Image: Cudeiro et al
Datasets
Latest papers
CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior
In this paper, we propose to cast speech-driven facial animation as a code query task in a finite proxy space of the learned codebook, which effectively promotes the vividness of the generated motions by reducing the cross-modal mapping uncertainty.
Generating Holistic 3D Human Motion from Speech
This work addresses the problem of generating 3D holistic body motions from human speech.
3DFaceShop: Explicitly Controllable 3D-Aware Portrait Generation
In contrast to the traditional avatar creation pipeline which is a costly process, contemporary generative approaches directly learn the data distribution from photographs.
FaceFormer: Speech-Driven 3D Facial Animation with Transformers
Speech-driven 3D facial animation is challenging due to the complex geometry of human faces and the limited availability of 3D audio-visual data.
FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning
In this paper, we propose a talking face generation method that takes an audio signal as input and a short target video clip as reference, and synthesizes a photo-realistic video of the target face with natural lip motions, head poses, and eye blinks that are in-sync with the input audio signal.
MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement
To improve upon existing models, we propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face.
Learning an Animatable Detailed 3D Face Model from In-The-Wild Images
Some methods produce faces that cannot be realistically animated because they do not model how wrinkles vary with expression.
Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose
In this paper, we address this problem by proposing a deep neural network model that takes an audio signal A of a source person and a very short video V of a target person as input, and outputs a synthesized high-quality talking face video with personalized head pose (making use of the visual information in V), expression and lip synchronization (by considering both A and V).
Capture, Learning, and Synthesis of 3D Speaking Styles
To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers.
Learning a model of facial shape and expression from 4D scans
FLAME is low-dimensional but more expressive than the FaceWarehouse model and the Basel Face Model.