Human-Object Interaction Detection

132 papers with code • 6 benchmarks • 22 datasets

Human-Object Interaction (HOI) detection is a task of identifying "a set of interactions" in an image, which involves the i) localization of the subject (i.e., humans) and target (i.e., objects) of interaction, and ii) the classification of the interaction labels.

Libraries

Use these libraries to find Human-Object Interaction Detection models and implementations

Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection

faceonlive/ai-research 9 Apr 2024

In addition, these detectors primarily rely on category names and overlook the rich contextual information that language can provide, which is essential for capturing open vocabulary concepts that are typically rare and not well-represented by category names alone.

131
09 Apr 2024

Disentangled Pre-training for Human-Object Interaction Detection

xingaoli/dp-hoi 2 Apr 2024

Therefore, we propose an efficient disentangled pre-training method for HOI detection (DP-HOI) to address this problem.

1
02 Apr 2024

Glance and Focus: Memory Prompting for Multi-Event Video Question Answering

byz0e/glance-focus NeurIPS 2023

Instead of that, we train an Encoder-Decoder to generate a set of dynamic event memories at the glancing stage.

17
03 Jan 2024

Ins-HOI: Instance Aware Human-Object Interactions Recovery

jiajunzhang16/ins-hoi 15 Dec 2023

To address this, we further propose a complementary training strategy that leverages synthetic data to introduce instance-level shape priors, enabling the disentanglement of occupancy fields for different instances.

9
15 Dec 2023

EgoPlan-Bench: Benchmarking Egocentric Embodied Planning with Multimodal Large Language Models

chenyi99/egoplan 11 Dec 2023

Given diverse environmental inputs, including real-time task progress, visual observations, and open-form language instructions, a proficient task planner is expected to predict feasible actions, which is a feat inherently achievable by Multimodal Large Language Models (MLLMs).

35
11 Dec 2023

Instance Tracking in 3D Scenes from Egocentric Videos

it3dego/it3dego 7 Dec 2023

We explore this problem by first introducing a new benchmark dataset, consisting of RGB and depth videos, per-frame camera pose, and instance-level annotations in both 2D camera and 3D world coordinates.

8
07 Dec 2023

Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundation Models

caoyichao/unihoi NeurIPS 2023

We conduct a deep analysis of the three hierarchical features inherent in visual HOI detectors and propose a method for high-level relation extraction aimed at VL foundation models, which we call HO prompt-based learning.

21
07 Nov 2023

Object-centric Video Representation for Long-term Action Anticipation

brown-palm/ObjectPrompt 31 Oct 2023

To recognize and predict human-object interactions, we use a Transformer-based neural architecture which allows the "retrieval" of relevant objects for action anticipation at various time scales.

3
31 Oct 2023

Open-Set Image Tagging with Multi-Grained Text Supervision

xinyu1205/Recognize_Anything-Tag2Text 23 Oct 2023

Specifically, for predefined commonly used tag categories, RAM++ showcases 10. 2 mAP and 15. 4 mAP enhancements over CLIP on OpenImages and ImageNet.

2,395
23 Oct 2023

ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios

syscv/sam-hq 26 Sep 2023

ENIGMA-51 is a new egocentric dataset acquired in an industrial scenario by 19 subjects who followed instructions to complete the repair of electrical boards using industrial tools (e. g., electric screwdriver) and equipments (e. g., oscilloscope).

3,333
26 Sep 2023