# Hand Keypoint Detection in Single Images using Multiview Bootstrapping

The method is used to train a hand keypoint detector for single images.

# KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects

We address two problems: first, we establish an easy method for capturing and labeling 3D keypoints on desktop objects with an RGB camera; and second, we develop a deep neural network, called $KeyPose$, that learns to accurately predict object poses using 3D keypoints, from stereo input, and works even for transparent objects.

# XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera

The first stage is a convolutional neural network (CNN) that estimates 2D and 3D pose features along with identity assignments for all visible joints of all individuals. We contribute a new architecture for this CNN, called SelecSLS Net, that uses novel selective long and short range skip connections to improve the information flow allowing for a drastically faster network without compromising accuracy.

# VIBE: Video Inference for Human Body Pose and Shape Estimation

Human motion is fundamental to understanding behavior.

# Unsupervised learning with sparse space-and-time autoencoders

We use spatially-sparse two, three and four dimensional convolutional autoencoder networks to model sparse structures in 2D space, 3D space, and 3+1=4 dimensional space-time.

# FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration

To construct FrankMocap, we build the state-of-the-art monocular 3D "hand" motion capture method by taking the hand part of the whole body parametric model (SMPL-X).

# Scalable Gradients for Stochastic Differential Equations

The adjoint sensitivity method scalably computes gradients of solutions to ordinary differential equations.

# Neural Relational Inference for Interacting Systems

Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics.

# Monocular Total Capture: Posing Face, Body, and Hands in the Wild

We present the first method to capture the 3D total motion of a target person from a monocular view input.

# Skeleton-Aware Networks for Deep Motion Retargeting

In other words, our operators form the building blocks of a new deep motion processing framework that embeds the motion into a common latent space, shared by a collection of homeomorphic skeletons.

