Reality's Canvas, Language's Brush: Crafting 3D Avatars from Monocular Video

8 Dec 2023  ·  Yuchen Rao, Eduardo Perez Pellitero, Benjamin Busam, Yiren Zhou, Jifei Song ·

Recent advancements in 3D avatar generation excel with multi-view supervision for photorealistic models. However, monocular counterparts lag in quality despite broader applicability. We propose ReCaLaB to close this gap. ReCaLaB is a fully-differentiable pipeline that learns high-fidelity 3D human avatars from just a single RGB video. A pose-conditioned deformable NeRF is optimized to volumetrically represent a human subject in canonical T-pose. The canonical representation is then leveraged to efficiently associate neural textures using 2D-3D correspondences. This enables the separation of diffused color generation and lighting correction branches that jointly compose an RGB prediction. The design allows to control intermediate results for human pose, body shape, texture, and lighting with text prompts. An image-conditioned diffusion model thereby helps to animate appearance and pose of the 3D avatar to create video sequences with previously unseen human motion. Extensive experiments show that ReCaLaB outperforms previous monocular approaches in terms of image quality for image synthesis tasks. Moreover, natural language offers an intuitive user interface for creative manipulation of 3D human avatars.

PDF Abstract
No code implementations yet. Submit your code now

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods