TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Human-Object Interaction Detection	HICO-DET	STIP (ResNet-50)	mAP	32.22	# 21
Human-Object Interaction Detection	HICO-DET	STIP (ResNet-50)	Time Per Frame (ms)	74	# 8
Human-Object Interaction Detection	V-COCO	STIP	AP(S1)	66.0	# 3
Human-Object Interaction Detection	V-COCO	STIP	Time Per Frame(ms)	74	# 8
Human-Object Interaction Detection	V-COCO	STIP	AP(S2)	70.7	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploring-structure-aware-transformer-over-1/human-object-interaction-detection-on-v-coco)](https://paperswithcode.com/sota/human-object-interaction-detection-on-v-coco?p=exploring-structure-aware-transformer-over-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploring-structure-aware-transformer-over-1/human-object-interaction-detection-on-hico)](https://paperswithcode.com/sota/human-object-interaction-detection-on-hico?p=exploring-structure-aware-transformer-over-1)`

Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection

CVPR 2022 · Yong Zhang, Yingwei Pan, Ting Yao, Rui Huang, Tao Mei, Chang-Wen Chen ·

Recent high-performing Human-Object Interaction (HOI) detection techniques have been highly influenced by Transformer-based object detector (i.e., DETR). Nevertheless, most of them directly map parametric interaction queries into a set of HOI predictions through vanilla Transformer in a one-stage manner. This leaves rich inter- or intra-interaction structure under-exploited. In this work, we design a novel Transformer-style HOI detector, i.e., Structure-aware Transformer over Interaction Proposals (STIP), for HOI detection. Such design decomposes the process of HOI set prediction into two subsequent phases, i.e., an interaction proposal generation is first performed, and then followed by transforming the non-parametric interaction proposals into HOI predictions via a structure-aware Transformer. The structure-aware Transformer upgrades vanilla Transformer by encoding additionally the holistically semantic structure among interaction proposals as well as the locally spatial structure of human/object within each interaction proposal, so as to strengthen HOI predictions. Extensive experiments conducted on V-COCO and HICO-DET benchmarks have demonstrated the effectiveness of STIP, and superior results are reported when comparing with the state-of-the-art HOI detectors. Source code is available at \url{https://github.com/zyong812/STIP}.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Code

Add Remove Mark official

zyong812/stip official

Tasks

Add Remove

Human-Object Interaction Detection

Object

Datasets

MS COCO

HICO-DET

V-COCO

Results from the Paper

Edit

Ranked #3 on Human-Object Interaction Detection on V-COCO

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Human-Object Interaction Detection	HICO-DET	STIP (ResNet-50)	mAP	32.22	# 21	Compare
Human-Object Interaction Detection	HICO-DET	STIP (ResNet-50)	Time Per Frame (ms)	74	# 8	Compare
Human-Object Interaction Detection	V-COCO	STIP	AP(S1)	66.0	# 3	Compare
			Time Per Frame(ms)	74	# 8	Compare
			AP(S2)	70.7	# 3	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove