Human-Object Interaction Detection

132 papers with code • 6 benchmarks • 22 datasets

Human-Object Interaction (HOI) detection is a task of identifying "a set of interactions" in an image, which involves the i) localization of the subject (i.e., humans) and target (i.e., objects) of interaction, and ii) the classification of the interaction labels.

Benchmarks

Add a Result

These leaderboards are used to track progress in Human-Object Interaction Detection

Dataset	Best Model	Compare
HICO-DET	RLIPv2 (Swin-L)	See all
V-COCO	RLIPv2	See all
HICO	DEFR	See all
VidHOI	HOI4ABOT	See all
Ambiguious-HOI	DJ-RN	See all
MECCANO	SlowFast + FasterRCNN	See all

Libraries

Use these libraries to find Human-Object Interaction Detection models and implementations

lhc1224/cross-view-ag

2 papers

lhc1224/cross-view-affordance-groun…

2 papers

Datasets

Subtasks

Affordance Recognition

Most implemented papers

Most implemented Social Latest No code

D3D-HOI: Dynamic 3D Human-Object Interactions from Videos

facebookresearch/d3d-hoi • • 19 Aug 2021

We evaluate this approach on our dataset, demonstrating that human-object relations can significantly reduce the ambiguity of articulated object reconstructions from challenging real-world videos.

Paper
Code

Learning Affordance Grounding from Exocentric Images

lhc1224/cross-view-ag • • CVPR 2022

To empower an agent with such ability, this paper proposes a task of affordance grounding from exocentric view, i. e., given exocentric human-object interaction and egocentric object images, learning the affordance knowledge of the object and transferring it to the egocentric image using only the affordance label as supervision.

Paper
Code

Discovering Human-Object Interaction Concepts via Self-Compositional Learning

zhihou7/HOI-CL • • 27 Mar 2022

Therefore, the proposed method enables the learning on both known and unknown HOI concepts.

Paper
Code

Grounded Affordance from Exocentric View

lhc1224/cross-view-affordance-grounding • • 28 Aug 2022

Due to the diversity of interactive affordance, the uniqueness of different individuals leads to diverse interactions, which makes it difficult to establish an explicit link between object parts and affordance labels.

Paper
Code

ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios

syscv/sam-hq • • 26 Sep 2023

ENIGMA-51 is a new egocentric dataset acquired in an industrial scenario by 19 subjects who followed instructions to complete the repair of electrical boards using industrial tools (e. g., electric screwdriver) and equipments (e. g., oscilloscope).

Paper
Code

Open-Set Image Tagging with Multi-Grained Text Supervision

xinyu1205/recognize-anything • • 23 Oct 2023

Specifically, for predefined commonly used tag categories, RAM++ showcases 10. 2 mAP and 15. 4 mAP enhancements over CLIP on OpenImages and ImageNet.

Paper
Code

Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection

ltttpku/cmd-se-release • • 9 Apr 2024

In addition, these detectors primarily rely on category names and overlook the rich contextual information that language can provide, which is essential for capturing open vocabulary concepts that are typically rare and not well-represented by category names alone.

Paper
Code