Action Localization
131 papers with code • 0 benchmarks • 3 datasets
Action Localization is finding the spatial and temporal co ordinates for an action in a video. An action localization model will identify which frame an action start and ends in video and return the x,y coordinates of an action. Further the co ordinates will change when the object performing action undergoes a displacement.
Benchmarks
These leaderboards are used to track progress in Action Localization
Libraries
Use these libraries to find Action Localization models and implementationsLatest papers
Revisiting Foreground and Background Separation in Weakly-supervised Temporal Action Localization: A Clustering-based Approach
It comprises two core components: a snippet clustering component that groups the snippets into multiple latent clusters and a cluster classification component that further classifies the cluster as foreground or background.
Unsupervised Temporal Action Localization via Self-paced Incremental Learning
Thereafter, we design two (constant- and variable- speed) incremental instance learning strategies for easy-to-hard model training, thus ensuring the reliability of these video pseudolabels and further improving overall localization performance.
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
We first benchmark MM-Navigator on our collected iOS screen dataset.
Temporal Action Localization with Enhanced Instant Discriminability
Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.
HR-Pro: Point-supervised Temporal Action Localization via Hierarchical Reliability Propagation
For snippet-level learning, we introduce an online-updated memory to store reliable snippet prototypes for each class.
DDG-Net: Discriminability-Driven Graph Network for Weakly-supervised Temporal Action Localization
Considering this phenomenon, we propose Discriminability-Driven Graph Network (DDG-Net), which explicitly models ambiguous snippets and discriminative snippets with well-designed connections, preventing the transmission of ambiguous information and enhancing the discriminability of snippet-level representations.
NMS Threshold matters for Ego4D Moment Queries -- 2nd place solution to the Ego4D Moment Queries Challenge 2023
This report describes our submission to the Ego4D Moment Queries Challenge 2023.
Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization
Weakly-supervised temporal action localization aims to localize and recognize actions in untrimmed videos with only video-level category labels during training.
Boosting Weakly-Supervised Temporal Action Localization with Text Information
For the discriminative objective, we propose a Text-Segment Mining (TSM) mechanism, which constructs a text description based on the action class label, and regards the text as the query to mine all class-related segments.
Weakly-Supervised Temporal Action Localization with Bidirectional Semantic Consistency Constraint
The proposed Bi-SCC firstly adopts a temporal context augmentation to generate an augmented video that breaks the correlation between positive actions and their co-scene actions in the inter-video; Then, a semantic consistency constraint (SCC) is used to enforce the predictions of the original video and augmented video to be consistent, hence suppressing the co-scene actions.