Video Alignment

21 papers with code • 2 benchmarks • 4 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Time-Contrastive Networks: Self-Supervised Learning from Video

tensorflow/models 23 Apr 2017

While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human.

Learning from Video and Text via Large-Scale Discriminative Clustering

jpeyre/unrel ICCV 2017

Discriminative clustering has been successfully applied to a number of weakly-supervised learning tasks.

Temporal Cycle-Consistency Learning

google-research/google-research CVPR 2019

We introduce a self-supervised representation learning method based on the task of temporal alignment between videos.

View-Invariant Probabilistic Embedding for Human Pose

google-research/google-research ECCV 2020

Depictions of similar human body configurations can vary with changing viewpoints.

View-Invariant, Occlusion-Robust Probabilistic Embedding for Human Pose

google-research/google-research 23 Oct 2020

Recognition of human poses and actions is crucial for autonomous systems to interact smoothly with people.

Dynamic Temporal Alignment of Speech to Lips

tavihalperin/AV-sync 19 Aug 2018

This alignment is based on deep audio-visual features, mapping the lips video and the speech signal to a shared representation.

Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video

mees/Adversarial-Skill-Networks 21 Oct 2019

Our method learns a general skill embedding independently from the task context by using an adversarial loss.

Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning

minghchen/carl_code CVPR 2022

In this paper, we introduce a novel contrastive action representation learning (CARL) framework to learn frame-wise action representations, especially for long videos, in a self-supervised manner.

Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space

elicassion/3dtrl 23 Jun 2022

To this end, we propose a 3D Token Representation Layer (3DTRL) that estimates the 3D positional information of the visual tokens and leverages it for learning viewpoint-agnostic representations.