Search Results for author: Zitian Tang

Found 2 papers, 1 papers with code

Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding

no code implementations30 Nov 2023 Rohan Myer Krishnan, Zitian Tang, Zhiqiu Yu, Chen Sun

To do this, video-language models must be able to obtain structured understandings, such as the temporal segmentation of a demonstration into sequences of actions and skills, and to generalize the understandings to novel domains.

Video Retrieval Video Understanding

Cannot find the paper you are looking for? You can Submit a new open access paper.