UniPose: Detecting Any Keypoints
This work proposes a unified framework called UniPose to detect keypoints of any articulated (e.g., human and animal), rigid, and soft objects via visual or textual prompts for fine-grained vision understanding and manipulation. Keypoint is a structure-aware, pixel-level, and compact representation of any object, especially articulated objects. Existing fine-grained promptable tasks mainly focus on object instance detection and segmentation but often fail to identify fine-grained granularity and structured information of image and instance, such as eyes, leg, paw, etc. Meanwhile, prompt-based keypoint detection is still under-explored. To bridge the gap, we make the first attempt to develop an end-to-end prompt-based keypoint detection framework called UniPose to detect keypoints of any objects. As keypoint detection tasks are unified in this framework, we can leverage 13 keypoint detection datasets with 338 keypoints across 1,237 categories over 400K instances to train a generic keypoint detection model. UniPose can effectively align text-to-keypoint and image-to-keypoint due to the mutual enhancement of textual and visual prompts based on the cross-modality contrastive learning optimization objectives. Our experimental results show that UniPose has strong fine-grained localization and generalization abilities across image styles, categories, and poses. Based on UniPose as a generalist keypoint detector, we hope it could serve fine-grained visual perception, understanding, and generation.
PDF AbstractCode
Results from the Paper
Ranked #1 on 2D Human Pose Estimation on Human-Art (using extra training data)
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Uses Extra Training Data |
Benchmark |
---|---|---|---|---|---|---|---|
2D Pose Estimation | 300W | UniPose | Mean PCK@0.2 | 99.4 | # 1 | ||
2D Pose Estimation | Animal Kingdom | UniPose | Mean PCK@0.2 | 96.1 | # 1 | ||
PCK@0.05 | 71.5 | # 1 | |||||
Animal Pose Estimation | AP-10K | UniPose | AP | 79.2 | # 4 | ||
2D Pose Estimation | Desert Locust | UniPose | Mean PCK@0.2 | 99.9 | # 1 | ||
2D Human Pose Estimation | Human-Art | UniPose | AP | 0.759 | # 1 | ||
2D Pose Estimation | MacaquePose | UniPose | AP | 79.4 | # 1 | ||
Multi-Person Pose Estimation | MS COCO | UniPose | AP | 0.768 | # 3 | ||
2D Pose Estimation | Vinegar Fly | UniPose | Mean PCK@0.2 | 99.9 | # 1 |