3D Action Recognition
34 papers with code • 3 benchmarks • 14 datasets
Image: Rahmani et al
Libraries
Use these libraries to find 3D Action Recognition models and implementationsDatasets
Subtasks
Most implemented papers
Real-time 3D human action recognition based on Hyperpoint sequence
Instead of capturing spatio-temporal local structures, SequentialPointNet encodes the temporal evolution of static appearances to recognize human actions.
3DVNet: Multi-View Depth Prediction and Volumetric Refinement
Furthermore, unlike existing volumetric MVS techniques, our 3D CNN operates on a feature-augmented point cloud, allowing for effective aggregation of multi-view information and flexible iterative refinement of depth maps.
Deep Hierarchical Representation of Point Cloud Videos via Spatio-Temporal Decomposition
Specifically, a spatial operation is employed to capture the local structure of each spatial region in a tube and a temporal operation is used to model the dynamics of the spatial regions along the tube.
Domain Knowledge-Informed Self-Supervised Representations for Workout Form Assessment
To that end, we propose to learn exercise-oriented image and video representations from unlabeled samples such that a small dataset annotated by experts suffices for supervised error detection.
No Pain, Big Gain: Classify Dynamic Point Cloud Sequences with Static Models by Fitting Feature-level Space-time Surfaces
Scene flow is a powerful tool for capturing the motion field of 3D point clouds.
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
Assembly101 is a new procedural activity dataset featuring 4321 videos of people assembling and disassembling 101 "take-apart" toy vehicles.
PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences
Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension.
Multi-Scale Spatial Temporal Graph Convolutional Network for Skeleton-Based Action Recognition
To solve this problem, we present a multi-scale spatial graph convolution (MS-GC) module and a multi-scale temporal graph convolution (MT-GC) module to enrich the receptive field of the model in spatial and temporal dimensions.
Collaborating Domain-shared and Target-specific Feature Clustering for Cross-domain 3D Action Recognition
Furthermore, to leverage the complementarity of domain-shared features and target-specific features, we propose a novel collaborative clustering strategy to enforce pair-wise relationship consistency between the two branches.
CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation
In this work, we formulate the cross-modal interaction as a bidirectional knowledge distillation problem.