Self-Supervised Action Recognition
34 papers with code • 6 benchmarks • 5 datasets
Latest papers with no code
A Large-Scale Analysis on Self-Supervised Video Representation Learning
In this work, we first provide a benchmark that enables a comparison of existing approaches on the same ground.
Self-supervised Contrastive Learning for Audio-Visual Action Recognition
To learn supervised information from unlabeled videos, we propose a novel self-supervised contrastive learning module (SelfCL).
Human-Centered Prior-Guided and Task-Dependent Multi-Task Representation Learning for Action Recognition Pre-Training
Recently, much progress has been made for self-supervised action recognition.
Self-Supervised Video Representation Learning with Meta-Contrastive Network
Our method contains two training stages based on model-agnostic meta learning (MAML), each of which consists of a contrastive branch and a meta branch.
Self-Supervised Learning via multi-Transformation Classification for Action Recognition
We use the learned models in pretext tasks as the pre-trained models and fine-tune them to recognize human actions in the downstream task.
Evolving Losses for Unsupervised Video Representation Learning
We present a new method to learn video representations from large-scale unlabeled video data.
Skip-Clip: Self-Supervised Spatiotemporal Representation Learning by Future Clip Order Ranking
Deep neural networks require collecting and annotating large amounts of data to train successfully.
Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction
Our method can learn the spatiotemporal representation of the video by predicting the order of shuffled clips from the video.
Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction
The success of deep neural networks generally requires a vast amount of training data to be labeled, which is expensive and unfeasible in scale, especially for video collections.
Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles
Self-supervised tasks such as colorization, inpainting and zigsaw puzzle have been utilized for visual representation learning for still images, when the number of labeled images is limited or absent at all.