Action Classification

227 papers with code • 24 benchmarks • 30 datasets

Libraries

Use these libraries to find Action Classification models and implementations

Most implemented papers

ECO: Efficient Convolutional Network for Online Video Understanding

mzolfaghari/ECO-efficient-video-understanding ECCV 2018

In this paper, we introduce a network architecture that takes long-term content into account and enables fast per-video processing at the same time.

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

rwightman/pytorch-image-models CVPR 2023

We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data.

Temporal Relational Reasoning in Videos

metalbubble/TRN-pytorch ECCV 2018

Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species.

Representation Flow for Action Recognition

piergiaj/representation-flow-cvpr19 CVPR 2019

Our representation flow layer is a fully-differentiable layer designed to capture the `flow' of any representation channel within a convolutional neural network for action recognition.

Revisiting 3D ResNets for Video Recognition

tensorflow/models 3 Sep 2021

A recent work from Bello shows that training and scaling strategies may be more significant than model architectures for visual recognition.

Masked Feature Prediction for Self-Supervised Visual Pre-Training

facebookresearch/SlowFast CVPR 2022

We present Masked Feature Prediction (MaskFeat) for self-supervised pre-training of video models.

CoCa: Contrastive Captioners are Image-Text Foundation Models

mlfoundations/open_clip 4 May 2022

We apply a contrastive loss between unimodal image and text embeddings, in addition to a captioning loss on the multimodal decoder outputs which predicts text tokens autoregressively.

Revisiting Classifier: Transferring Vision-Language Models for Video Recognition

whwu95/text4vis 4 Jul 2022

In this study, we focus on transferring knowledge for video classification tasks.

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

whwu95/BIKE CVPR 2023

In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.

TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition

chihyaoma/Activity-Recognition-with-CNN-and-RNN 30 Mar 2017

We demonstrate that using both RNNs (using LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance.