SSv2-Spatio-Temporal (Something Someting v2-Spatio-Temporal)

Introduced by Jain et al. in PEEKABOO: Interactive Video Generation via Masked-Diffusion

We use Something-Something v2 dataset to obtain the generation prompts and ground truth masks from real action videos. We filter out a set of 295 prompts. The details for this filtering are in the "Peekaboo: Interactive Video Generation via Masked-Diffusion" paper. We then use an off-the-shelf OWL-ViT-large open-vocabulary object detector to obtain the bounding box (bbox) annotations of the object in the videos. This set represents bbox and prompt pairs of real-world videos, serving as a test bed for both the quality and control of methods for generating realistic videos with spatio-temporal control.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


  • Unknown

Modalities


Languages