Video Alignment
22 papers with code • 2 benchmarks • 4 datasets
Most implemented papers
Learning a Grammar Inducer from Massive Uncurated Instructional Videos
While previous work focuses on building systems for inducing grammars on text that are well-aligned with video content, we investigate the scenario, in which text and video are only in loose correspondence.
Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos
Sequential video understanding, as an emerging video understanding task, has driven lots of researchers' attention because of its goal-oriented nature.
Aligning Step-by-Step Instructional Diagrams to Video Demonstrations
In this paper, we consider a novel setting where such an alignment is between (i) instruction steps that are depicted as assembly diagrams (commonly seen in Ikea assembly manuals) and (ii) video segments from in-the-wild videos; these videos comprising an enactment of the assembly actions in the real world.
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video.
Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation
Moreover, to fully unlock model capabilities for high-quality video generation and promote the development of the field, we curate a large-scale and open-source video dataset called HD-VG-130M.
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers
Both PAAT and PAAB surpass their respective backbone Transformers by up to 9. 8% in real-world action recognition and 21. 8% in multi-view robotic video alignment.
A Solution to CVPR'2023 AQTC Challenge: Video Alignment for Multi-Step Inference
In this paper, we present a solution for enhancing video alignment to improve multi-step inference.
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation.
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
For video generation, various open-sourced models and public-available services have been developed to generate high-quality videos.
AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI
To establish a unified evaluation framework for video generation tasks, our benchmark includes 11 metrics spanning four dimensions to assess algorithm performance.