SSv2-Spatio-Temporal (Something Someting v2-Spatio-Temporal)

Introduced by Jain et al. in PEEKABOO: Interactive Video Generation via Masked-Diffusion

We use Something-Something v2 dataset to obtain the generation prompts and ground truth masks from real action videos. We filter out a set of 295 prompts. The details for this filtering are in the "Peekaboo: Interactive Video Generation via Masked-Diffusion" paper. We then use an off-the-shelf OWL-ViT-large open-vocabulary object detector to obtain the bounding box (bbox) annotations of the object in the videos. This set represents bbox and prompt pairs of real-world videos, serving as a test bed for both the quality and control of methods for generating realistic videos with spatio-temporal control.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

SSv2-Spatio-Temporal (Something Someting v2-Spatio-Temporal)

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Usage

License

Modalities

Languages

SSv2-Spatio-Temporal (Something Someting v2-Spatio-Temporal)

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages