We present an approach which takes advantage of both structure and semantics for unsupervised monocular learning of depth and ego-motion.
Deep learning for predicting or generating 3D human pose sequences is an active research area.
Human motion modelling is a classical problem at the intersection of graphics and computer vision, with applications spanning human-computer interaction, motion synthesis, and motion prediction for virtual and augmented reality.
We address the unsupervised learning of several interconnected problems in low-level vision: single view depth prediction, camera motion estimation, optical flow, and segmentation of a video into the static scene and moving regions.
#20 best model for Monocular Depth Estimation on KITTI Eigen split
Many video enhancement algorithms rely on optical flow to register frames in a video sequence.
#5 best model for Video Frame Interpolation on Middlebury
At each time step, the system receives as input a video frame, predicts the optical flow based on the current observation and the LSTM memory state as a dense transformation map, and applies it to the current frame to generate the next frame.