Human-Object Interaction Detection

132 papers with code • 6 benchmarks • 22 datasets

Human-Object Interaction (HOI) detection is a task of identifying "a set of interactions" in an image, which involves the i) localization of the subject (i.e., humans) and target (i.e., objects) of interaction, and ii) the classification of the interaction labels.

Libraries

Use these libraries to find Human-Object Interaction Detection models and implementations

Most implemented papers

D3D-HOI: Dynamic 3D Human-Object Interactions from Videos

facebookresearch/d3d-hoi 19 Aug 2021

We evaluate this approach on our dataset, demonstrating that human-object relations can significantly reduce the ambiguity of articulated object reconstructions from challenging real-world videos.

Learning Affordance Grounding from Exocentric Images

lhc1224/cross-view-ag CVPR 2022

To empower an agent with such ability, this paper proposes a task of affordance grounding from exocentric view, i. e., given exocentric human-object interaction and egocentric object images, learning the affordance knowledge of the object and transferring it to the egocentric image using only the affordance label as supervision.

Discovering Human-Object Interaction Concepts via Self-Compositional Learning

zhihou7/HOI-CL 27 Mar 2022

Therefore, the proposed method enables the learning on both known and unknown HOI concepts.

Grounded Affordance from Exocentric View

lhc1224/cross-view-affordance-grounding 28 Aug 2022

Due to the diversity of interactive affordance, the uniqueness of different individuals leads to diverse interactions, which makes it difficult to establish an explicit link between object parts and affordance labels.

ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios

syscv/sam-hq 26 Sep 2023

ENIGMA-51 is a new egocentric dataset acquired in an industrial scenario by 19 subjects who followed instructions to complete the repair of electrical boards using industrial tools (e. g., electric screwdriver) and equipments (e. g., oscilloscope).

Open-Set Image Tagging with Multi-Grained Text Supervision

xinyu1205/recognize-anything 23 Oct 2023

Specifically, for predefined commonly used tag categories, RAM++ showcases 10. 2 mAP and 15. 4 mAP enhancements over CLIP on OpenImages and ImageNet.

Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection

ltttpku/cmd-se-release 9 Apr 2024

In addition, these detectors primarily rely on category names and overlook the rich contextual information that language can provide, which is essential for capturing open vocabulary concepts that are typically rare and not well-represented by category names alone.

Attentional Pooling for Action Recognition

rohitgirdhar/AttentionalPoolingAction NeurIPS 2017

We introduce a simple yet surprisingly powerful model to incorporate attention in action recognition and human object interaction tasks.

Pairwise Body-Part Attention for Recognizing Human-Object Interactions

Imposingapple/Transferable_Interactiveness_Network_with_Partpair ECCV 2018

We propose a new pairwise body-part attention model which can learn to focus on crucial parts, and their correlations for HOI recognition.

Learning Human-Object Interactions by Graph Parsing Neural Networks

SiyuanQi/gpnn ECCV 2018

For a given scene, GPNN infers a parse graph that includes i) the HOI graph structure represented by an adjacency matrix, and ii) the node labels.