TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Zero-Shot Human-Object Interaction Detection	HICO-DET	ConsNet (ResNet-50)	mAP (UC)	19.81	# 3
Zero-Shot Human-Object Interaction Detection	HICO-DET	ConsNet (ResNet-50)	mAP (UO)	20.71	# 2
Zero-Shot Human-Object Interaction Detection	HICO-DET	ConsNet (ResNet-50)	mAP (UA)	19.04	# 1
Human-Object Interaction Detection	HICO-DET	ConsNet-F (ResNet-50)	mAP	25.94	# 33
Human-Object Interaction Detection	HICO-DET	ConsNet (ResNet-50)	mAP	22.15	# 40

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/consnet-learning-consistency-graph-for-zero/zero-shot-human-object-interaction-detection)](https://paperswithcode.com/sota/zero-shot-human-object-interaction-detection?p=consnet-learning-consistency-graph-for-zero)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/consnet-learning-consistency-graph-for-zero/human-object-interaction-detection-on-hico)](https://paperswithcode.com/sota/human-object-interaction-detection-on-hico?p=consnet-learning-consistency-graph-for-zero)`

ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection

14 Aug 2020 · Ye Liu, Junsong Yuan, Chang Wen Chen ·

We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of <human, action, object> in images. Most existing works treat HOIs as individual interaction categories, thus can not handle the problem of long-tail distribution and polysemy of action labels. We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs. Leveraging the compositional and relational peculiarities of HOI labels, we propose ConsNet, a knowledge-aware framework that explicitly encodes the relations among objects, actions and interactions into an undirected graph called consistency graph, and exploits Graph Attention Networks (GATs) to propagate knowledge among HOI categories as well as their constituents. Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities. We extensively evaluate our model on the challenging V-COCO and HICO-DET datasets, and results validate that our approach outperforms state-of-the-arts under both fully-supervised and zero-shot settings. Code is available at https://github.com/yeliudev/ConsNet.

PDF Abstract

Code

Add Remove Mark official

yeliudev/ConsNet official

UCSB-VRL/GTNet

Tasks

Add Remove

Human-Object Interaction Detection

Object

Zero-Shot Human-Object Interaction Detection

Datasets

HICO-DET

V-COCO

Results from the Paper

Edit

Ranked #3 on Zero-Shot Human-Object Interaction Detection on HICO-DET

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Zero-Shot Human-Object Interaction Detection	HICO-DET	ConsNet (ResNet-50)	mAP (UC)	19.81	# 3	Compare
			mAP (UO)	20.71	# 2	Compare
			mAP (UA)	19.04	# 1	Compare
Human-Object Interaction Detection	HICO-DET	ConsNet-F (ResNet-50)	mAP	25.94	# 33	Compare
Human-Object Interaction Detection	HICO-DET	ConsNet (ResNet-50)	mAP	22.15	# 40	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove