Talk-to-Edit: Fine-Grained Facial Editing via Dialog

Facial editing is an important task in vision and graphics with numerous applications. However, existing works are incapable to deliver a continuous and fine-grained editing mode (e.g., editing a slightly smiling face to a big laughing one) with natural interactions with users. In this work, we propose Talk-to-Edit, an interactive facial editing framework that performs fine-grained attribute manipulation through dialog between the user and the system. Our key insight is to model a continual "semantic field" in the GAN latent space. 1) Unlike previous works that regard the editing as traversing straight lines in the latent space, here the fine-grained editing is formulated as finding a curving trajectory that respects fine-grained attribute landscape on the semantic field. 2) The curvature at each step is location-specific and determined by the input image as well as the users' language requests. 3) To engage the users in a meaningful dialog, our system generates language feedback by considering both the user request and the current state of the semantic field. We also contribute CelebA-Dialog, a visual-language facial editing dataset to facilitate large-scale study. Specifically, each image has manually annotated fine-grained attribute annotations as well as template-based textual descriptions in natural language. Extensive quantitative and qualitative experiments demonstrate the superiority of our framework in terms of 1) the smoothness of fine-grained editing, 2) the identity/attribute preservation, and 3) the visual photorealism and dialog fluency. Notably, user study validates that our overall system is consistently favored by around 80% of the participants. Our project page is https://www.mmlab-ntu.com/project/talkedit/.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Datasets


Introduced in the Paper:

CelebA-Dialog

Used in the Paper:

CelebA
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Fine-Grained Facial Editing CelebA-Dialog Talk-to-Edit Bangs 0.5276 / 0.2902 # 1
Eyeglasses 0.6229 / 0.7720 # 1
Beard 0.7634 / 0.5425 # 2
Smiling 0.4580 / 0.3573 # 2
Young 0.6234 / 0.2731 # 2
Fine-Grained Facial Editing CelebA-Dialog Multiclass SVM Bangs 0.7262 / 0.5387 # 2
Eyeglasses 0.6967 / 0.9046 # 2
Beard 1.1098 / 1.7361 # 1
Smiling 0.7959 / 0.8676 # 1
Young 0.7610 / 1.3866 # 1

Methods


No methods listed for this paper. Add relevant methods here