Keypoint Detection
148 papers with code • 7 benchmarks • 11 datasets
Keypoint Detection involves simultaneously detecting people and localizing their keypoints. Keypoints are the same thing as interest points. They are spatial locations, or points in the image that define what is interesting or what stand out in the image. They are invariant to image rotation, shrinkage, translation, distortion, and so on.
( Image credit: PifPaf: Composite Fields for Human Pose Estimation; "Learning to surf" by fotologic, license: CC-BY-2.0 )
Libraries
Use these libraries to find Keypoint Detection models and implementationsDatasets
Most implemented papers
ArtTrack: Articulated Multi-person Tracking in the Wild
In this paper we propose an approach for articulated tracking of multiple people in unconstrained videos.
Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation
We rethink a well-know bottom-up approach for multi-person pose estimation and propose an improved one.
Rethinking on Multi-Stage Networks for Human Pose Estimation
Existing pose estimation approaches fall into two categories: single-stage and multi-stage methods.
Pose2Seg: Detection Free Human Instance Segmentation
We demonstrate that our pose-based framework can achieve better accuracy than the state-of-art detection-based approach on the human instance segmentation problem, and can moreover better handle occlusion.
Distribution-Aware Coordinate Representation for Human Pose Estimation
Interestingly, we found that the process of decoding the predicted heatmaps into the final joint coordinates in the original image space is surprisingly significant for human pose estimation performance, which nevertheless was not recognised before.
OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association
We present a generic neural network architecture that uses Composite Fields to detect and construct a spatio-temporal pose which is a single, connected graph whose nodes are the semantic keypoints (e. g., a person's body joints) in multiple frames.
Polarized Self-Attention: Towards High-quality Pixel-wise Regression
Pixel-wise regression is probably the most common problem in fine-grained computer vision tasks, such as estimating keypoint heatmaps and segmentation masks.
Associative Embedding: End-to-End Learning for Joint Detection and Grouping
We introduce associative embedding, a novel method for supervising convolutional neural networks for the task of detection and grouping.
Cascaded Pyramid Network for Multi-Person Pose Estimation
In this paper, we present a novel network structure called Cascaded Pyramid Network (CPN) which targets to relieve the problem from these "hard" keypoints.
Dynamic Convolution: Attention over Convolution Kernels
Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and the width (number of channels) of CNNs, resulting in limited representation capability.