Video Alignment
21 papers with code • 2 benchmarks • 4 datasets
Latest papers
Listen Then See: Video Alignment with Speaker Attention
Our approach exhibits an improved ability to leverage the video modality by using the audio modality as a bridge with the language modality.
Subjective-Aligned Dataset and Metric for Text-to-Video Quality Assessment
Based on T2VQA-DB, we propose a novel transformer-based model for subjective-aligned Text-to-Video Quality Assessment (T2VQA).
AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI
To establish a unified evaluation framework for video generation tasks, our benchmark includes 11 metrics spanning four dimensions to assess algorithm performance.
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
For video generation, various open-sourced models and public-available services have been developed to generate high-quality videos.
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation.
A Solution to CVPR'2023 AQTC Challenge: Video Alignment for Multi-Step Inference
In this paper, we present a solution for enhancing video alignment to improve multi-step inference.
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers
Both PAAT and PAAB surpass their respective backbone Transformers by up to 9. 8% in real-world action recognition and 21. 8% in multi-view robotic video alignment.
Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation
Moreover, to fully unlock model capabilities for high-quality video generation and promote the development of the field, we curate a large-scale and open-source video dataset called HD-VG-130M.
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video.
Aligning Step-by-Step Instructional Diagrams to Video Demonstrations
In this paper, we consider a novel setting where such an alignment is between (i) instruction steps that are depicted as assembly diagrams (commonly seen in Ikea assembly manuals) and (ii) video segments from in-the-wild videos; these videos comprising an enactment of the assembly actions in the real world.