The method is used to train a hand keypoint detector for single images.
We address two problems: first, we establish an easy method for capturing and labeling 3D keypoints on desktop objects with an RGB camera; and second, we develop a deep neural network, called $KeyPose$, that learns to accurately predict object poses using 3D keypoints, from stereo input, and works even for transparent objects.
The first stage is a convolutional neural network (CNN) that estimates 2D and 3D pose features along with identity assignments for all visible joints of all individuals. We contribute a new architecture for this CNN, called SelecSLS Net, that uses novel selective long and short range skip connections to improve the information flow allowing for a drastically faster network without compromising accuracy.
Ranked #1 on 3D Multi-Person Pose Estimation on MuPoTS-3D
We use spatially-sparse two, three and four dimensional convolutional autoencoder networks to model sparse structures in 2D space, 3D space, and 3+1=4 dimensional space-time.
To construct FrankMocap, we build the state-of-the-art monocular 3D "hand" motion capture method by taking the hand part of the whole body parametric model (SMPL-X).
Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics.
We present the first method to capture the 3D total motion of a target person from a monocular view input.
Ranked #33 on 3D Human Pose Estimation on Human3.6M
In other words, our operators form the building blocks of a new deep motion processing framework that embeds the motion into a common latent space, shared by a collection of homeomorphic skeletons.