Human-Object Interaction Detection

132 papers with code • 6 benchmarks • 22 datasets

Human-Object Interaction (HOI) detection is a task of identifying "a set of interactions" in an image, which involves the i) localization of the subject (i.e., humans) and target (i.e., objects) of interaction, and ii) the classification of the interaction labels.

Benchmarks

Add a Result

These leaderboards are used to track progress in Human-Object Interaction Detection

Dataset	Best Model	Compare
HICO-DET	RLIPv2 (Swin-L)	See all
V-COCO	RLIPv2	See all
HICO	DEFR	See all
VidHOI	HOI4ABOT	See all
Ambiguious-HOI	DJ-RN	See all
MECCANO	SlowFast + FasterRCNN	See all

Libraries

Use these libraries to find Human-Object Interaction Detection models and implementations

lhc1224/cross-view-ag

2 papers

lhc1224/cross-view-affordance-groun…

2 papers

Datasets

Subtasks

Affordance Recognition

Latest papers

Most implemented Social Latest No code

Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection

faceonlive/ai-research • 9 Apr 2024

In addition, these detectors primarily rely on category names and overlook the rich contextual information that language can provide, which is essential for capturing open vocabulary concepts that are typically rare and not well-represented by category names alone.

131

09 Apr 2024

Paper
Code

Disentangled Pre-training for Human-Object Interaction Detection

xingaoli/dp-hoi • 2 Apr 2024

Therefore, we propose an efficient disentangled pre-training method for HOI detection (DP-HOI) to address this problem.

02 Apr 2024

Paper
Code

Glance and Focus: Memory Prompting for Multi-Event Video Question Answering

byz0e/glance-focus • • NeurIPS 2023

Instead of that, we train an Encoder-Decoder to generate a set of dynamic event memories at the glancing stage.

03 Jan 2024

Paper
Code

Ins-HOI: Instance Aware Human-Object Interactions Recovery

jiajunzhang16/ins-hoi • 15 Dec 2023

To address this, we further propose a complementary training strategy that leverages synthetic data to introduce instance-level shape priors, enabling the disentanglement of occupancy fields for different instances.

15 Dec 2023

Paper
Code

EgoPlan-Bench: Benchmarking Egocentric Embodied Planning with Multimodal Large Language Models

chenyi99/egoplan • • 11 Dec 2023

Given diverse environmental inputs, including real-time task progress, visual observations, and open-form language instructions, a proficient task planner is expected to predict feasible actions, which is a feat inherently achievable by Multimodal Large Language Models (MLLMs).

11 Dec 2023

Paper
Code

Instance Tracking in 3D Scenes from Egocentric Videos

it3dego/it3dego • 7 Dec 2023

We explore this problem by first introducing a new benchmark dataset, consisting of RGB and depth videos, per-frame camera pose, and instance-level annotations in both 2D camera and 3D world coordinates.

07 Dec 2023

Paper
Code

Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundation Models

caoyichao/unihoi • NeurIPS 2023

We conduct a deep analysis of the three hierarchical features inherent in visual HOI detectors and propose a method for high-level relation extraction aimed at VL foundation models, which we call HO prompt-based learning.

07 Nov 2023

Paper
Code

Object-centric Video Representation for Long-term Action Anticipation

brown-palm/ObjectPrompt • • 31 Oct 2023

To recognize and predict human-object interactions, we use a Transformer-based neural architecture which allows the "retrieval" of relevant objects for action anticipation at various time scales.

31 Oct 2023

Paper
Code

Open-Set Image Tagging with Multi-Grained Text Supervision

xinyu1205/Recognize_Anything-Tag2Text • • 23 Oct 2023

Specifically, for predefined commonly used tag categories, RAM++ showcases 10. 2 mAP and 15. 4 mAP enhancements over CLIP on OpenImages and ImageNet.

2,395

23 Oct 2023

Paper
Code

ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios

syscv/sam-hq • • 26 Sep 2023

ENIGMA-51 is a new egocentric dataset acquired in an industrial scenario by 19 subjects who followed instructions to complete the repair of electrical boards using industrial tools (e. g., electric screwdriver) and equipments (e. g., oscilloscope).

3,333

26 Sep 2023

Paper
Code

Human-Object Interaction Detection

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result