Video Alignment

22 papers with code • 2 benchmarks • 4 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Learning a Grammar Inducer from Massive Uncurated Instructional Videos

Sy-Zhang/MMC-PCFG 22 Oct 2022

While previous work focuses on building systems for inducing grammars on text that are well-aligned with video content, we investigate the scenario, in which text and video are only in loose correspondence.

Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos

svip-lab/weaksvr CVPR 2023

Sequential video understanding, as an emerging video understanding task, has driven lots of researchers' attention because of its goal-oriented nature.

Aligning Step-by-Step Instructional Diagrams to Video Demonstrations

DavidZhang73/AssemblyVideoManualAlignment CVPR 2023

In this paper, we consider a novel setting where such an alignment is between (i) instruction steps that are depicted as assembly diagrams (commonly seen in Ikea assembly manuals) and (ii) video segments from in-the-wild videos; these videos comprising an enactment of the assembly actions in the real world.

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

baaivision/vid2vid-zero 30 Mar 2023

Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video.

Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation

daooshee/hd-vg-130m 18 May 2023

Moreover, to fully unlock model capabilities for high-quality video generation and promote the development of the field, we curate a large-scale and open-source video dataset called HD-VG-130M.

Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers

dominickrei/poseawarevt 15 Jun 2023

Both PAAT and PAAB surpass their respective backbone Transformers by up to 9. 8% in real-world action recognition and 21. 8% in multi-view robotic video alignment.

A Solution to CVPR'2023 AQTC Challenge: Video Alignment for Multi-Step Inference

zcfinal/loveu-cvpr23-aqtc 26 Jun 2023

In this paper, we present a solution for enhancing video alignment to improve multi-step inference.

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

showlab/show-1 27 Sep 2023

In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation.

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

EvalCrafter/EvalCrafter 17 Oct 2023

For video generation, various open-sourced models and public-available services have been developed to generate high-quality videos.

AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI

benchcouncil/aigcbench 3 Jan 2024

To establish a unified evaluation framework for video generation tasks, our benchmark includes 11 metrics spanning four dimensions to assess algorithm performance.