Temporal Action Localization
421 papers with code • 14 benchmarks • 42 datasets
Temporal Action Localization aims to detect activities in the video stream and output beginning and end timestamps. It is closely related to Temporal Action Proposal Generation.
Libraries
Use these libraries to find Temporal Action Localization models and implementationsDatasets
Subtasks
Latest papers with no code
DeepLocalization: Using change point detection for Temporal Action Localization
In this study, we introduce DeepLocalization, an innovative framework devised for the real-time localization of actions tailored explicitly for monitoring driver behavior.
Learning to Score Sign Language with Two-stage Method
Human action recognition and performance assessment have been hot research topics in recent years.
Leveraging Temporal Contextualization for Video Action Recognition
We propose Temporal Contextualization (TC), a novel layer-wise temporal information infusion mechanism for video that extracts core information from each frame, interconnects relevant information across the video to summarize into context tokens, and ultimately leverages the context tokens during the feature encoding process.
Exploring Explainability in Video Action Recognition
To address these, we introduce Video-TCAV, by building on TCAV for Image Classification tasks, which aims to quantify the importance of specific concepts in the decision-making process of Video Action Recognition models.
Multimodal Attack Detection for Action Recognition Models
In addition, we analyze our method's real-time performance with different hardware setups to demonstrate its potential as a practical defense mechanism.
MSSTNet: A Multi-Scale Spatio-Temporal CNN-Transformer Network for Dynamic Facial Expression Recognition
Our approach takes spatial features of different scales extracted by CNN and feeds them into a Multi-scale Embedding Layer (MELayer).
Localizing Moments of Actions in Untrimmed Videos of Infants with Autism Spectrum Disorder
This study is the first to conduct end-to-end temporal action localization in untrimmed videos of infants with ASD, offering promising avenues for early intervention and support.
PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos
PhysPT exploits a Transformer encoder-decoder backbone to effectively learn human dynamics in a self-supervised manner.
Language Model Guided Interpretable Video Action Reasoning
Extensive experiments on two complex video action datasets, Charades & CAD-120, validates the improved performance and interpretability of our LaIAR framework.
LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization
Temporal Action Localization (TAL) involves localizing and classifying action snippets in an untrimmed video.