no code implementations • 30 Nov 2023 • Rohan Myer Krishnan, Zitian Tang, Zhiqiu Yu, Chen Sun
To do this, video-language models must be able to obtain structured understandings, such as the temporal segmentation of a demonstration into sequences of actions and skills, and to generalize the understandings to novel domains.
1 code implementation • CVPR 2023 • Zitian Tang, Wenjie Ye, Wei-Chiu Ma, Hang Zhao
Inferring past human motion from RGB images is challenging due to the inherent uncertainty of the prediction problem.