Action Recognition In Videos

64 papers with code • 17 benchmarks • 17 datasets

Action Recognition in Videos is a task in computer vision and pattern recognition where the goal is to identify and categorize human actions performed in a video sequence. The task involves analyzing the spatiotemporal dynamics of the actions and mapping them to a predefined set of action classes, such as running, jumping, or swimming.

Libraries

Use these libraries to find Action Recognition In Videos models and implementations
4 papers
3,884
3 papers
550
2 papers
2,983
See all 5 libraries.

Most implemented papers

You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization

wei-tim/YOWO 15 Nov 2019

YOWO is a single-stage architecture with two branches to extract temporal and spatial information concurrently and predict bounding boxes and action probabilities directly from video clips in one evaluation.

R-C3D: Region Convolutional 3D Network for Temporal Activity Detection

VisionLearningGroup/R-C3D ICCV 2017

We address the problem of activity detection in continuous, untrimmed video streams.

What Makes Training Multi-Modal Classification Networks Hard?

facebookresearch/R2Plus1D CVPR 2020

Consider end-to-end training of a multi-modal vs. a single-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or outperform its single-modal counterpart.

Gating Revisited: Deep Multi-layer RNNs That Can Be Trained

0zgur0/STAR_Network 25 Nov 2019

We propose a new STAckable Recurrent cell (STAR) for recurrent neural networks (RNNs), which has fewer parameters than widely used LSTM and GRU while being more robust against vanishing or exploding gradients.

Action Recognition using Visual Attention

kracwarlock/action-recognition-visual-attention 12 Nov 2015

We propose a soft attention based model for the task of action recognition in videos.

2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning

dluvizon/deephar CVPR 2018

Action recognition and human pose estimation are closely related but both problems are generally handled as distinct tasks in the literature.

Resource Efficient 3D Convolutional Neural Networks

okankop/Efficient-3DCNNs 4 Apr 2019

Recently, convolutional neural networks with 3D kernels (3D CNNs) have been very popular in computer vision community as a result of their superior ability of extracting spatio-temporal features within video frames compared to 2D CNNs.

Learning Video Representations from Correspondence Proposals

xingyul/cpnet CVPR 2019

In particular, it can effectively learn representations for videos by mixing appearance and long-range motion with an RGB-only input.

IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos

mks0601/IntegralAction_RELEASE 13 Jul 2020

Most current action recognition methods heavily rely on appearance information by taking an RGB sequence of entire image regions as input.

Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework

BestJuly/Inter-intra-video-contrastive-learning 6 Aug 2020

With the proposed Inter-Intra Contrastive (IIC) framework, we can train spatio-temporal convolutional networks to learn video representations.