Action Recognition In Videos
64 papers with code • 17 benchmarks • 17 datasets
Action Recognition in Videos is a task in computer vision and pattern recognition where the goal is to identify and categorize human actions performed in a video sequence. The task involves analyzing the spatiotemporal dynamics of the actions and mapping them to a predefined set of action classes, such as running, jumping, or swimming.
Libraries
Use these libraries to find Action Recognition In Videos models and implementationsDatasets
Latest papers with no code
NAS-TC: Neural Architecture Search on Temporal Convolutions for Complex Action Recognition
Accordingly, because of the automated design of its network structure, Neural architecture search (NAS) has achieved great success in the image processing field and attracted substantial research attention in recent years.
Temporal Difference Networks for Action Recognition
To mitigate this issue, this paper presents a new video architecture, termed as Temporal Difference Network (TDN), with a focus on capturing multi-scale temporal information for efficient action recognition.
Developing Motion Code Embedding for Action Recognition in Videos
In this work, we propose a motion embedding strategy known as motion codes, which is a vectorized representation of motions based on a manipulation's salient mechanical attributes.
Toward Accurate Person-level Action Recognition in Videos of Crowded Scenes
Prior works always fail to deal with this problem in two aspects: (1) lacking utilizing information of the scenes; (2) lacking training data in the crowd and complex scenes.
Dynamic Sampling Networks for Efficient Action Recognition in Videos
The existing action recognition methods are mainly based on clip-level classifiers such as two-stream CNNs or 3D CNNs, which are trained from the randomly selected clips and applied to densely sampled clips during testing.
Spatiotemporal Fusion in 3D CNNs: A Probabilistic View
Based on the probability space, we further generate new fusion strategies which achieve the state-of-the-art performance on four well-known action recognition datasets.
TEA: Temporal Excitation and Aggregation for Action Recognition
Temporal modeling is key for action recognition in videos.
Dynamic Inference: A New Approach Toward Efficient Video Action Recognition
In a nutshell, we treat input frames and network depth of the computational graph as a 2-dimensional grid, and several checkpoints are placed on this grid in advance with a prediction module.
An Information-rich Sampling Technique over Spatio-Temporal CNN for Classification of Human Actions in Videos
Traditionally in deep learning based human activity recognition approaches, either a few random frames or every $k^{th}$ frame of the video is considered for training the 3D CNN, where $k$ is a small positive integer, like 4, 5, or 6.
Skeleton based Activity Recognition by Fusing Part-wise Spatio-temporal and Attention Driven Residues
There exist a wide range of intra class variations of the same actions and inter class similarity among the actions, at the same time, which makes the action recognition in videos very challenging.