Video Alignment

21 papers with code • 2 benchmarks • 4 datasets

This task has no description! Would you like to contribute one?

Latest papers with no code

Listen Then See: Video Alignment with Speaker Attention

no code yet • 21 Apr 2024

Our approach exhibits an improved ability to leverage the video modality by using the audio modality as a bridge with the language modality.

AniClipart: Clipart Animation with Text-to-Video Priors

no code yet • 18 Apr 2024

To generate cartoon-style and smooth motion, we first define B\'{e}zier curves over keypoints of the clipart image as a form of motion regularization.

Scaling Up Video Summarization Pretraining with Large Language Models

no code yet • 4 Apr 2024

Long-form video content constitutes a significant portion of internet traffic, making automated video summarization an essential research problem.

The Effects of Short Video-Sharing Services on Video Copy Detection

no code yet • 26 Mar 2024

From the experimental results focusing on segment-level and video-level situations, we can see that three effects: "Segment-level VCD in short video-sharing services is more difficult than those in general video-sharing services", "Video-level VCD in short video-sharing services is easier than those in general video-sharing services", "The video alignment component mainly suppress the detection performance in short video-sharing services".

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

no code yet • 18 Mar 2024

To this end, this paper proposes a novel text-guided video inpainting model that achieves better consistency, controllability and compatibility.

FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing

no code yet • 10 Mar 2024

By leveraging the self-consistency property of CMs, we eliminate the need for time-consuming inversion or additional condition extraction, reducing editing time.

Towards A Better Metric for Text-to-Video Generation

no code yet • 15 Jan 2024

Experiments on the TVGE dataset demonstrate the superiority of the proposed T2VScore on offering a better metric for text-to-video generation.

STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment

no code yet • 12 Oct 2023

Continuously learning a variety of audio-video semantics over time is crucial for audio-related reasoning tasks in our ever-evolving world.

Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment

no code yet • ICCV 2023

Nonetheless, the objective of the text-to-video retrieval task is to capture the complementary audio and video information that is pertinent to the text query rather than simply achieving better audio and video alignment.

ContentCTR: Frame-level Live Streaming Click-Through Rate Prediction with Multimodal Transformer

no code yet • 26 Jun 2023

However, most previous works treat the live as a whole item and explore the Click-through-Rate (CTR) prediction framework on item-level, neglecting that the dynamic changes that occur even within the same live room.