Action Classification

228 papers with code • 24 benchmarks • 30 datasets

Libraries

Use these libraries to find Action Classification models and implementations

Most implemented papers

Long-Term Feature Banks for Detailed Video Understanding

facebookresearch/video-long-term-feature-banks CVPR 2019

To understand the world, we humans constantly need to relate the present to the past, and put events in context.

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment

ParitoshParmar/MTL-AQA CVPR 2019

Can performance on the task of action quality assessment (AQA) be improved by exploiting a description of the action and its quality?

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

google-research/scenic 21 Jun 2021

In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks.

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

MCG-NJU/VideoMAE 23 Mar 2022

Pre-training video transformers on extra large-scale datasets is generally required to achieve premier performance on relatively small datasets.

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning

ruiwang2021/mvd CVPR 2023

For the choice of teacher models, we observe that students taught by video teachers perform better on temporally-heavy video tasks, while image teachers transfer stronger spatial representations for spatially-heavy video tasks.

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

alibaba/AliceMind 1 Feb 2023

In contrast to predominant paradigms of solely relying on sequence-to-sequence generation or encoder-based instance discrimination, mPLUG-2 introduces a multi-module composition network by sharing common universal modules for modality collaboration and disentangling different modality modules to deal with modality entanglement.

Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture

mil-tokyo/FTGAN 27 Nov 2017

FlowGAN generates optical flow, which contains only the edge and motion of the videos to be begerated.

Weakly Supervised Action Localization by Sparse Temporal Pooling Network

demianzhang/weakly-action-localization CVPR 2018

We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks.

Timeception for Complex Action Recognition

noureldien/timeception CVPR 2019

This paper focuses on the temporal aspect for recognizing human activities in videos; an important visual cue that has long been undervalued.

VideoBERT: A Joint Model for Video and Language Representation Learning

ammesatyajit/VideoBERT ICCV 2019

Self-supervised learning has become increasingly important to leverage the abundance of unlabeled data available on platforms like YouTube.