TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text-to-3D-Human Generation	DeepFashion	CCH	Frechet Inception Distance	22.175	# 1
Text-to-3D-Human Generation	DeepFashion	CCH	Percentage of Correct Keypoints	88.313	# 1
Text-to-3D-Human Generation	DeepFashion	CCH	CLIP Score	25.031	# 1
Text-to-3D-Human Generation	DeepFashion	CCH	Fashion Accuracy	72.038	# 1
Text-to-3D-Human Generation	DeepFashion	CCH	Depth Error	1.21	# 1
Text-to-3D-Human Generation	SHHQ	CCH	Frechet Inception Distance	33.348	# 1
Text-to-3D-Human Generation	SHHQ	CCH	Depth Error	1.67	# 1
Text-to-3D-Human Generation	SHHQ	CCH	Percentage of Correct Keypoints	87.879	# 1
Text-to-3D-Human Generation	SHHQ	CCH	CLIP Score	27.873	# 1
Text-to-3D-Human Generation	SHHQ	CCH	Fashion Accuracy	76.194	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/text-guided-3d-human-generation-from-2d/text-to-3d-human-generation-on-deepfashion)](https://paperswithcode.com/sota/text-to-3d-human-generation-on-deepfashion?p=text-guided-3d-human-generation-from-2d)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/text-guided-3d-human-generation-from-2d/text-to-3d-human-generation-on-shhq)](https://paperswithcode.com/sota/text-to-3d-human-generation-on-shhq?p=text-guided-3d-human-generation-from-2d)`

Text-guided 3D Human Generation from 2D Collections

23 May 2023 · Tsu-Jui Fu, Wenhan Xiong, Yixin Nie, Jingyu Liu, Barlas Oğuz, William Yang Wang ·

3D human modeling has been widely used for engaging interaction in gaming, film, and animation. The customization of these characters is crucial for creativity and scalability, which highlights the importance of controllability. In this work, we introduce Text-guided 3D Human Generation (\texttt{T3H}), where a model is to generate a 3D human, guided by the fashion description. There are two goals: 1) the 3D human should render articulately, and 2) its outfit is controlled by the given text. To address this \texttt{T3H} task, we propose Compositional Cross-modal Human (CCH). CCH adopts cross-modal attention to fuse compositional human rendering with the extracted fashion semantics. Each human body part perceives relevant textual guidance as its visual patterns. We incorporate the human prior and semantic discrimination to enhance 3D geometry transformation and fine-grained consistency, enabling it to learn from 2D collections for data efficiency. We conduct evaluations on DeepFashion and SHHQ with diverse fashion attributes covering the shape, fabric, and color of upper and lower clothing. Extensive experiments demonstrate that CCH achieves superior results for \texttt{T3H} with high efficiency.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

text-to-3d-human

Text-to-3D-Human Generation

Datasets

NeRF

DeepFashion

Results from the Paper

Edit

Ranked #1 on Text-to-3D-Human Generation on SHHQ

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text-to-3D-Human Generation	DeepFashion	CCH	Frechet Inception Distance	22.175	# 1	Compare
			Percentage of Correct Keypoints	88.313	# 1	Compare
			CLIP Score	25.031	# 1	Compare
			Fashion Accuracy	72.038	# 1	Compare
			Depth Error	1.21	# 1	Compare
Text-to-3D-Human Generation	SHHQ	CCH	Frechet Inception Distance	33.348	# 1	Compare
			Depth Error	1.67	# 1	Compare
			Percentage of Correct Keypoints	87.879	# 1	Compare
			CLIP Score	27.873	# 1	Compare
			Fashion Accuracy	76.194	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Text-guided 3D Human Generation from 2D Collections

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove