Human-Object Interaction Detection
132 papers with code • 6 benchmarks • 22 datasets
Human-Object Interaction (HOI) detection is a task of identifying "a set of interactions" in an image, which involves the i) localization of the subject (i.e., humans) and target (i.e., objects) of interaction, and ii) the classification of the interaction labels.
Benchmarks
These leaderboards are used to track progress in Human-Object Interaction Detection
Libraries
Use these libraries to find Human-Object Interaction Detection models and implementationsMost implemented papers
D3D-HOI: Dynamic 3D Human-Object Interactions from Videos
We evaluate this approach on our dataset, demonstrating that human-object relations can significantly reduce the ambiguity of articulated object reconstructions from challenging real-world videos.
Learning Affordance Grounding from Exocentric Images
To empower an agent with such ability, this paper proposes a task of affordance grounding from exocentric view, i. e., given exocentric human-object interaction and egocentric object images, learning the affordance knowledge of the object and transferring it to the egocentric image using only the affordance label as supervision.
Discovering Human-Object Interaction Concepts via Self-Compositional Learning
Therefore, the proposed method enables the learning on both known and unknown HOI concepts.
Grounded Affordance from Exocentric View
Due to the diversity of interactive affordance, the uniqueness of different individuals leads to diverse interactions, which makes it difficult to establish an explicit link between object parts and affordance labels.
ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios
ENIGMA-51 is a new egocentric dataset acquired in an industrial scenario by 19 subjects who followed instructions to complete the repair of electrical boards using industrial tools (e. g., electric screwdriver) and equipments (e. g., oscilloscope).
Open-Set Image Tagging with Multi-Grained Text Supervision
Specifically, for predefined commonly used tag categories, RAM++ showcases 10. 2 mAP and 15. 4 mAP enhancements over CLIP on OpenImages and ImageNet.
Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection
In addition, these detectors primarily rely on category names and overlook the rich contextual information that language can provide, which is essential for capturing open vocabulary concepts that are typically rare and not well-represented by category names alone.
Attentional Pooling for Action Recognition
We introduce a simple yet surprisingly powerful model to incorporate attention in action recognition and human object interaction tasks.
Pairwise Body-Part Attention for Recognizing Human-Object Interactions
We propose a new pairwise body-part attention model which can learn to focus on crucial parts, and their correlations for HOI recognition.
Learning Human-Object Interactions by Graph Parsing Neural Networks
For a given scene, GPNN infers a parse graph that includes i) the HOI graph structure represented by an adjacency matrix, and ii) the node labels.