TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video-to-image Affordance Grounding	EPIC-Hotspot	Hotspot	KLD	1.26	# 3
Video-to-image Affordance Grounding	EPIC-Hotspot	Hotspot	SIM	0.40	# 3
Video-to-image Affordance Grounding	EPIC-Hotspot	Hotspot	AUC-J	0.79	# 3
Video-to-image Affordance Grounding	OPRA (28x28)	Hotspot	KLD	1.47	# 4
Video-to-image Affordance Grounding	OPRA (28x28)	Hotspot	SIM	0.36	# 4
Video-to-image Affordance Grounding	OPRA (28x28)	Hotspot	AUC-J	0.81	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/grounded-human-object-interaction-hotspots/video-to-image-affordance-grounding-on-epic)](https://paperswithcode.com/sota/video-to-image-affordance-grounding-on-epic?p=grounded-human-object-interaction-hotspots)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/grounded-human-object-interaction-hotspots/video-to-image-affordance-grounding-on-opra-1)](https://paperswithcode.com/sota/video-to-image-affordance-grounding-on-opra-1?p=grounded-human-object-interaction-hotspots)`

Grounded Human-Object Interaction Hotspots from Video

ICCV 2019 · Tushar Nagarajan, Christoph Feichtenhofer, Kristen Grauman ·

Learning how to interact with objects is an important step towards embodied visual intelligence, but existing techniques suffer from heavy supervision or sensing requirements. We propose an approach to learn human-object interaction "hotspots" directly from video. Rather than treat affordances as a manually supervised semantic segmentation task, our approach learns about interactions by watching videos of real human behavior and anticipating afforded actions. Given a novel image or video, our model infers a spatial hotspot map indicating how an object would be manipulated in a potential interaction-- even if the object is currently at rest. Through results with both first and third person video, we show the value of grounding affordances in real human-object interactions. Not only are our weakly supervised hotspots competitive with strongly supervised affordance methods, but they can also anticipate object interaction for novel object categories.

PDF Abstract ICCV 2019 PDF ICCV 2019 Abstract

Code

Add Remove Mark official

Tushar-N/interaction-hotspots

Tasks

Add Remove

Human-Object Interaction Detection

Object

Object Recognition

Semantic Segmentation

Video-to-image Affordance Grounding

Datasets

Introduced in the Paper:

EPIC-Hotspot

Used in the Paper:

OPRA

Results from the Paper

Edit

Ranked #3 on Video-to-image Affordance Grounding on EPIC-Hotspot

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video-to-image Affordance Grounding	EPIC-Hotspot	Hotspot	KLD	1.26	# 3	Compare
			SIM	0.40	# 3	Compare
			AUC-J	0.79	# 3	Compare
Video-to-image Affordance Grounding	OPRA (28x28)	Hotspot	KLD	1.47	# 4	Compare
			SIM	0.36	# 4	Compare
			AUC-J	0.81	# 3	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Grounded Human-Object Interaction Hotspots from Video

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove